Automatización del análisis sintáctico  para el español con el fin de crear in treebank  estandarizado

Minor Sandí Salazar; Gabriela Marín Raventós; Edgar Casasola

Ayuda

Automatización del análisis sintáctico para el español con el fin de crear in treebank estandarizado

Sandí Salazar, Minor ; Marín Raventós, Gabriela ; Casasola Murillo, Edgar ^[1]
1. [1] Universidad de Costa Rica
  
  Universidad de Costa Rica
  
  Hospital, Costa Rica
Localización: Káñina: Revista de Artes y Letras de la Universidad de Costa Rica, ISSN 0378-0473, ISSN-e 2215-2636, Vol. 40, Nº. Extra 4, 2016 (Ejemplar dedicado a: Káñina número extraordinario), págs. 163-174
Idioma: español
DOI: 10.15517/rk.v40i4.30232
Enlaces
- Texto completo
Resumen
- El crecimiento exponencial en la creación de documentos en la Internet, escritos en español, ofrece variadas oportunidades para el análisis de texto. Debido a su cantidad creciente y a la escasez de herramientas que colaboren en estos procesos, se hace imprescindible la creación de herramientas que los automaticen. Entre estas herramientas los treebanks ocupan un papel destacado, puesto que proveen información clave para muchos de los procesos de análisis. Actualmente, existe una tendencia que intenta estandarizar el etiquetado morfológico y sintáctico para crear puntos de contacto entre treebanks de distintas investigaciones. A partir de los antecedentes relacionados con el análisis sintáctico de textos, esta investigación propone una metodo- logía para determinar hasta dónde es posible automatizar el proceso de creación de treebanks, limitándose a la lengua española.
Referencias bibliográficas
- Bosco, Cristina et al. 2013. “Converting Italian treebanks: Towards an Italian Stanford dependency treebank”. En: Proceedings of the 7th Linguistic...
- Bresnan, Joan. 2001. Lexical-Functional Syntax. Oxford: Wiley-Blackwell.
- Buchholz, Sabine y Erwin Marsi. 2006. “CoNLL-X shared task on multilingual dependency parsing”. En: Proceedings of the Tenth Conference on...
- Civit Torruela, Montserrat y Antonin Martí. 2002. “Design principles for a Spanish treebank”. En: Proceedings of TLT.
- Civit Torruela, Montserrat y Antonin Martí. 2004. “Building Cast3LB: a Spanish tree- bank”. En: Research on Language and Computation II (4):...
- De Marneffe, Marie-Catherine et al. 2006. “Generating typed dependency par- ses from phrase structure parses”. En: Proceedings of Language...
- De Marneffe, Marie-Catherine y Christopher Manning. 2008. “The Stanford typed dependencies representation”. En: Coling 2008: Proceedings of...
- De Marneffe, Marie-Catherine et al. 2014. “Universal Stanford Dependencies: A cross-linguistic typology”. En: Proceedings of Language Resources...
- Fong, Sandiway. 2015. “TreeBank Search”. Recuperado de http://dingo.sbs.arizona. edu/~Sandiway/treebanksearch/index. html [Consulta 25 enero....
- Hajičová, Eva et al. 2010. “Treebank Annotation”. En: Nitin Indurkhya and Fred J. Damerau (eds.): 167-188.
- Indurkhya, Nitin y Fred J. Damerau. 2010.
- Handbook of natural language proces- sing. 2. Boca Ratón, FL: CRC Press.
- Institut Universitari de Lingüística Aplicada. 1998. “El Corpus de L’IULA: Etiquetaris”. Recuperado de http://www.iula.upf.edu/ repositori/98inf018.pdf....
- Instituto Cervantes. 2015. “El Español: Una lengua muyviva”.Informe2015.Recuperadodehttp:// elnuevosol.net/wp-content/uploads/2016/05/ espanol_lengua-viva_20151.pdf....
- Jara-Murillo, Carla. 2013. “El treebank del espa- ñol IPROCOLDI: componente anotado del
- corpus CODIMEP-CR”. En: Revista de Filología y Lingüística de la Universidad de Costa Rica XXXIX (2): 143-171.
- Kučera, Henry y Nelson F. 1967. Computational analysis of present-day American English. Providence, United States: Brown University Press.
- Leech, G. et al. 1996. “Guidelines for the stan- dardization of syntactic annotation of corpora”. En: EAGLES Document EAG- TCWG-SASG/1.8.
- Lees, Robert y Noam Chomsky. 1957. “Syntactic Structures”. En: Language XXXIII (3): 375-408.
- Marcus, Mitchell et al. 1993. “Building a large annotated corpus of English: The Penn Treebank”. En: Computational linguistics XIX (2): 313-330.
- McDonald, Ryan et al. 2013. “Universal Dependency Annotation for Multilingual Parsing”. Association for Computational Linguistics (2): 92-97.
- Megyesi, Beáta. 2015. Nordic Conference of Computational Linguistics NODALIDA 2015. Suecia: Link ping University Electronic Press.
- Melero, Maite. et al. 2012. The Spanish language in the digital age. Berlín: Springer.
- Nivre, Joakim. 2015. “Towards a Universal Grammar for Natural Language Processing”. In International Conference on Intelligent Text Processing...
- Nolan, Edmond y Samuel Abraham Hirsch. 1902.
- The Greek Grammar of Roger Bacon and a Fragment of his Hebrew Grammar. Cambridge: Cambridge University Press.
- Petrov, Slav et al. 2012. “A universal part-of- speech tagset”. En LREC.
- Pyysalo, Sampo et al. 2015. “Universal Dependencies for Finnish”. En: Megyesi, Beáta: 163.
- Taulé, Mariona et al. 2008. “AnCora: Multilevel Annotated Corpora for Catalan and Spanish”. En: LREC.
- Tesnière, Lucien. 1959. Élements de syntaxe estructurale. Paris: C. Klincksieck.
- Tsarfaty, Reut. 2013. “A Unified Morpho-Syntactic Scheme of Stanford Dependencies”. En: Association for Computational Linguistics (2): 578-584.
- Zeman, Daniel. 2008. “Reusable Tagset Conversion Using Tagset Drivers”. En: LREC 2008: 28-30.