Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego

Carlos Gómez Rodríguez; Miguel Á. Alonso; Marcos García González

Ayuda

Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego

Autores: Carlos Gómez Rodríguez , Miguel Á. Alonso , Marcos García González
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 57, 2016, págs. 33-40
Idioma: español
Títulos paralelos:
- Building a UD treebank using existing resources from related languages: the case of Galician
Enlaces
- Texto completo

Dialnet Métricas: 3 Citas

Resumen
- español
  En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el análisis sintáctico. El método consiste en la adaptación y combinación de diferentes treebanks anotados con dependencias universales de variedades lingüísticas próximas, con el objetivo de entrenar un analizador sintáctico para la lengua elegida, en nuestro caso el gallego. Durante el proceso de selección y adaptación de los treebanks de origen, analizamos el impacto de propiedades de tres niveles diferentes: (i) la distancia entre las lenguas de origen y destino, (ii) la adaptación de características léxico-ortográficas, y (iii) las directrices de anotación entre los treebanks. Usando la estrategia propuesta, entrenamos un analizador sintáctico estadístico para etiquetar, con resultados prometedores y sin datos previos de gallego, un pequeño corpus de esta lengua. La corrección manual de este corpus, usado como gold-standard, nos permitió probar la eficacia del método propuesto.
- English
  This paper presents a novel strategy for creating a Universal Dependencies (UD) treebank of a low-resource language. The method consists of adapting and combining different UD treebanks from related varieties in order to train a parser for the target language. More precisely, the paper explores the influence of three different levels for the selection and adaptation of the source treebanks: (i) the relatedness of the linguistic varieties, (ii) the adaptation of features based on lexical and spelling data, and (iii) the agreement in annotation criteria between different treebanks. The proposed strategy allowed us to train a parser for analyzing, with promising results, a small Galician corpus without previous availability of labeled data for this language. After a few bootstrapping iterations, we obtained a UD gold-standard corpus, used for proving the effectiveness of the proposed method.
Referencias bibliográficas
- Cintra, L. F. L. y C. Cunha. 1984. Nova gramática do portuguˆes contemporˆaneo. Sá da Costa, Lisboa.
- De Marneffe, M.-C. y C. D. Manning. 2008. The Stanford typed dependencies representation. En COLING 2008: Proceedings of the Workshop on Cross-Framework...
- Gamallo Otero, P. y I. González López. 2011. A grammatical formalism based on patterns of Part of Speech tags. International Journal of Corpus...
- Ganchev, K., J. Gillenwater, y B. Taskar. 2009. Dependency grammar induction via bitext projection constraints. En Proceedings of the Joint...
- Garcia, M. y I. J. González. 2012. Automatic Phonetic Transcription by Phonological Derivation. En H. Caseli A. Villavicencio A. Teixeira,...
- Gimpel, K. y N. A. Smith. 2014. Phrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features. Computational Linguistics, 40(2):349–401.
- Hwa, R., P. Resnik, A. Weinberg, C. Cabezas, y O. Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural...
- Lynn, T., J. Foster, M. Dras, L. Tounsi, y others. 2014. Cross-lingual transfer parsing for low-resourced languages: An Irish case study....
- Malvar, P., J. R. Pichel, O. Senra, P. Gama- ´ llo, y A. Garc´ıa. 2010. Vencendo a escassez de recursos computacionais. Carvalho: Tradutor...
- McDonald, R., S. Petrov, y K. Hall. 2011. Multi-source transfer of delexicalized dependency parsers. En Proceedings of the Conference on Empirical...
- McDonald, R. T., J. Nivre, Y. QuirmbachBrundage, Y. Goldberg, D. Das, K. Ganchev, K. B. Hall, S. Petrov, H. Zhang, O. T¨ackstr¨om, C. Bedini,...
- Nivre, J., J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. K¨ubler, S. Marinov, y E. Marsi. 2007. MaltParser: A languageindependent system...
- Padró, L. y E. Stanilovsky. 2012. Freeling 3.0: Towards wider multilinguality. En Proceedings of the 8th edition of the Language Resources...
- Petrov, S., D. Das, y R. McDonald. 2012. A Universal Part-of-Speech Tagset. En Proceedings of the Eight International Conference on Language...
- Ribeyre, C. 2015. M´ethodes d’Analyse Supervis´ee pour l’Interface Syntaxe-S´emantique. Ph.D. tesis, Universit´e Paris 7 Diderot.
- Rojo, G., M. L. Mart´ınez, E. D. Noya, y F. M. Barcala. 2015. Corpus de adestramento do Etiquetador/Lematizador do Galego Actual (XIADA),...
- Socher, R., A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, y C. Potts. 2013. Recursive Deep Models for Semantic Compositionality...
- Søgaard, A. 2011. Data point selection for cross-language adaptation of dependency parsers. En Proceedings of the 49th Annual Meeting of the...
- Vilares, D., M. A. Alonso, y C. GómezRodr´ıguez. 2016. One model, two languages: training bilingual parsers with harmonized treebanks. En...
- Zeman, D. y P. Resnik. 2008. CrossLanguage Parser Adaptation between Related Languages. En Proceedings of the Workshop on NLP for Less Privileged Language...