UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks

García-Díaz, José Antonio; Almela Sánchez-Lafuente, Ángela; Alcaraz Mármol, Gema; Valencia García, Rafael

UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/109309

Información del item - Informació de l'item - Item information
Título:	UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks
Título alternativo:	UMUCorpusClassifier: Recolección y evaluación de corpus lingüísticos para tareas de Procesamiento del Lenguaje Natural
Autor/es:	García-Díaz, José Antonio \| Almela Sánchez-Lafuente, Ángela \| Alcaraz Mármol, Gema \| Valencia García, Rafael
Palabras clave:	Corpus compilation \| Document classification \| Compilación de corpus \| Clasificación de documentos
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	sep-2020
Editor:	Sociedad Española para el Procesamiento del Lenguaje Natural
Cita bibliográfica:	Procesamiento del Lenguaje Natural. 2020, 65: 139-142. https://doi.org/10.26342/2020-65-22
Resumen:	The development of an annotated corpus is a very time-consuming task. Although some researchers have proposed the automatic annotation of a corpus based on ad-hoc heuristics, valid hypotheses cannot always be made. Even when the annotation process is performed by human annotators, the quality of the corpus is heavily influenced by disagreements between annotators or with themselves. Therefore, the lack of supervision of the annotation process can lead to poor quality corpus. In this work, we propose a demonstration of UMUCorpusClassifier, a NLP tool for aid researches for compiling corpus as well as coordinating and supervising the annotation process. This tool eases the daily supervision process and permits to detect deviations and inconsistencies during early stages of the annotation process. \| La construcción de un corpus anotado es una tarea que consume mucho tiempo. Aunque algunos investigadores han propuesto la anotación automática basada en heurísticas, éstas no siempre son posibles. Además, incluso cuando la anotación es realizada por personas puede haber discrepancias entre los mismos anotadores o de un anotador consigo mismo que influyen en la calidad del corpus. Por tanto, la falta de supervisión sobre el proceso de anotación puede llevar a corpus con baja calidad. En este trabajo, proponemos una demostración de UMUCorpusClassifier, una herramienta PLN para ayudar a los investigadores a compilar corpus y también a coordinar y supervisar el proceso de anotación. Esta herramienta facilita la monitorización diaria y permite detectar inconsistencias durante etapas tempranas del proceso de anotación.
Patrocinador/es:	This demonstration has been supported by the Spanish National Research Agency (AEI) and the European Regional Development Fund (FEDER/ERDF) through projects KBS4FIA (TIN2016-76323-R) and LaTe4PSP (PID2019-107652RB-I00). In addition, José Antonio García-Díaz has been supported by Banco Santander and University of Murcia through the Doctorado industrial programme.
URI:	http://hdl.handle.net/10045/109309
ISSN:	1135-5948
DOI:	10.26342/2020-65-22
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© Sociedad Española para el Procesamiento del Lenguaje Natural
Revisión científica:	si
Versión del editor:	https://doi.org/10.14198/10.26342/2020-65-22
Aparece en las colecciones:	Procesamiento del Lenguaje Natural - Nº 65 (2020)

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
PLN_65_22.pdf		1,01 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo