Ir al contenido

Documat


A hybrid approach to treebank construction

  • Autores: Montserrat Marimon Felipe, Lluís Padró Cirera Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 49, 2012, págs. 139-146
  • Idioma: inglés
  • Enlaces
  • Resumen
    • español

      Este art´ýculo describe investigaci´on sobre los efectos de la desambiguaci´on morfosint ´actica usada como un preproceso de un analizador sint´actico profundo basado en HPSG, en el contexto del desarrollo de un treebank del espa�nol de c´odigo abierto, en el entorno de DELPH-IN. La anotaci´on treebank se realiza manualmente tomando las decisiones apropiadas entre las opciones propuestas por el sistema y ordenadas por un m´odulo estad´ýstico. Los experimentos presentados muestran que el uso de un etiquetador reduce la ambig¨uedad de las frases, y contribuye a limitar la cantidad de frases cuyo an´alisis sobrepasaa el l´ýmite de tiempo, y ayuda a al m´odulo estad´ýstico a clasificar el ´arbol correcto entre los n mejores. Por un lado, nuestros resultados validan los beneficios ya reportados en la literatura de tal preproceso de an´alisis profundo con respecto a la velocidad, cobertura y precisi´on. Por otro lado, proponemos una estrategia basada en existentes herramientas de c´odigo abierto y recursos para desarrollar con alta consitencia treebanks de sintaxis profunda para idiomas con limitada disponibilidad de recursos ling¨u´ýsticos.

    • English

      This paper describes research on the effects of PoS tagging as a preprocess for HPSG-based deep parsing in the context of an open-source Spanish treebank development in the DELPH-IN framework. The treebank annotation is performed by hand selecting the proper decisions among the choices proposed by the system and ranked by a statistical module. The presented experiments show that the use of a tagger lowers the ambiguity of the sentences, both reducing the amount of sentences that reach time-out before the entire parse forest is built, and helping the ranker to place the right tree among the n-best trees.

      On the one hand, our results validate the benefits �already reported in the literature� of such preprocess to deep parsing with regard to speed, coverage, and accuracy. On the other hand, we propose a strategy based on existing open-source tools and resources to develop highly-consistent deep-annotated treebanks for languages with limited availability of linguistic resources.

  • Referencias bibliográficas
    • Adolphs, P., S. Oepen, U. Callmeier, B. Crysmann, D. Flickinger, and B. Kiefer. 2006. Some fine points of hybrid natural language parsing....
    • Bangalore, S., C. Doran, B.A. Hockey, and A. Joshi. 1997. An approach to robust partial parsing and evaluation metrics. In Proceedings of...
    • Bangalore, S. and A. Joshi. 1999. Supertagging: An approach to almost parsing. Computational Linguistics, 2(25):237–265.
    • Ciravegna, F. and A. Lavelli. 1997. Controlling bottom–up chart parsers though text chunking. In Proceedings of the 5th International Workshop...
    • Clark, S. and J.R. Curran. 2004. The importance of supertagging for wide-coverage CCG parsing. In Proceedings of the 20th International Conference...
    • Copestake, A. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford.
    • Copestake, A., D. Flickinger, C.J. Pollard, and I.A. Sag. 2006. Minimal recursion semantics: an introduction. Research on Language and Computation,...
    • Crysmann, B., A. Frank, B. Kiefer, S. Müller, G. Neumann, J. Piskorski, U. Schäfer, M. Siegel, H. Uszkoreit, F. Xu, M. Becker, and H.U. Krieger....
    • Daum, M., K.A. Foth, and W. Menzel. 2003. Constraint based integration of deep and shallow parsing techniques. In Proceeding of the 10th Conference...
    • Dridan, R. 2009. Using lexical statistics to improve HPSG parsing. Master’s thesis, Saarland University, Sarbrücken, Germany.
    • Frank, A., M. Becker, B. Crysmann, B. Kiefer, and U. Schäfer. 2003. Integrated shallow and deep parsing: Topp meets HPSG. In Proceedings of...
    • Grover, C. and A. Lascarides. 2001. XMLbased data preparation for robust deep parsing. In Proceedings of the 39th Annual Meeting on Association...
    • Hinrichs, E.W. and K. Simov, editors. 2004. Research on Language and Computation, volume 2(4). Kluwer Academic Publishers.
    • Marimon, M. 2002. Integrating shallow linguistic processing into a unification-based spanish grammar. In Proceedings of the 19th International...
    • Padró, L., M. Collado, S. Reese, M. Lloberes, and I. Castelón. 2010. Freeling 2.1: Five years of open-source language processing tools. In...
    • Pollard, C.J. and I.A. Sag. 1987. Information-Based Syntax and Semantics, Volume 1: Fundamentals. CSLI Lecture Notes, Stanford University,...
    • Pollard, C.J. and I.A. Sag. 1994. Head-driven Phrase Structure Grammar. The University of Chicago Press and CSLI Publications, Chicago.
    • Prins, R. and G. van Noord. 2003. Reinforcing parser preferences through tagging. Special issue on Evolutions in Parsing of the journal Traitement...
    • Riezler, S., T.H. King, R.M. Kaplan, R. Crouch, J.T. Maxwell, and M.Johnson. 2002. Parsing the wall street journal using a lexicalfunctional...
    • Sagae, K., Y. Miyao, and J. Tsujii. 2007. HPSG parsing with shallow dependency constraints. In Proceedings of the 45th Annual Meeting of the...
    • Taulé, M., M.A. Mart´ı, and M. Recasens. 2008. AnCora: Multilevel annotated corpora for catalan and spanish. In Proceedings of the 6th International...
    • Toutanova, K., C.D. Manning, D. Flickinger, and S. Oepen. 2005. Stochastic HPSG parse disambiguation using the redwoods corpus. Journal of...
    • Watanabe, H. 2000. A method for accelerating CFG-parsing by using dependency information. In Proceedings of the 18th International Conference...
    • Zhang, Y.-Z., T. Matsuzaki, and J. Tsujii. 2009. HPSG supertagging: A sequence labeling view. In Proceedings of the 11th International Conference...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno