Ir al contenido

Documat


Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach

  • Autores: Luz Marina Sierra Martínez, Carlos Alberto Cobos Lozada, Juan Carlos Corrales Muñoz, Tulio Rojas Curieux, Enrique Herrera Viedma Árbol académico, Diego Hernán Peluffo Ordoñez Árbol académico
  • Localización: Computación y Sistemas (CyS), ISSN 1405-5546, ISSN-e 2007-9737, Vol. 22, Nº. 3, 2018, págs. 881-894
  • Idioma: inglés
  • DOI: 10.13053/cys-22-3-3018
  • Enlaces
  • Resumen
    • Abstract: Nasa Yuwe is the language of the Nasa indigenous community in Colombia. It is currently threatened with extinction. In this regard, a range of computer science solutions have been developed to the teaching and revitalization of the language. One of the most suitable approaches is the construction of a Part-Of-Speech Tagging (POST), which encourages the analysis and advanced processing of the language. Nevertheless, for Nasa Yuwe no tagged corpus exists, neither is there a POS Tagger and no related works have been reported. This paper therefore concentrates on building a linguistic corpus tagged for the Nasa Yuwe language and generating the first tagging application for Nasa Yuwe. The main results and findings are 1) the process of building the Nasa Yuwe corpus, 2) the tagsets and tagged sentences, as well as the statistics associated with the corpus, 3) results of two experiments to evaluate several POS Taggers (a Random tagger, three versions of HSTAGger, a tagger based on the harmony search metaheuristic, and three versions of a memetic algorithm GBHS Tagger, based on Global-Best Harmony Search (GBHS), Hill Climbing and an explicit Tabu memory, which obtained the best results in contrast with the other methods considered over the Nasa Yuwe language corpus.

  • Referencias bibliográficas
    • Rojas, T. E. (2012). UNICEF El Lenguaje en Colombia, Tomo I: Realidad Lingüística de Colombia. Academia Colombiana de la Lengua e Instituto...
    • Sierra-Martínez, L.,Cobos, C.,Corrales, J.. (2016). Tokenizer adapted for Nasa Yuwe language. Computación y Sistemas. 20. 335
    • Sierra-Martínez, L. M.,Cobos-Lozada, C. A.,Corrales, J. C.,Rojas-Curieux, T.. (2015). Building a Nasa Yuwe Test Collection. Processing Computational...
    • Attia, M.,Rashwan, M.,Al-Badrashiny, M.. (2009). Fassieh (R), a Semi-Automatic Visual Interactive Tool for Morphological, PoS-Tags, Phonetic,...
    • Baeza-Yates, R.,Ribeiro-Neto, B.. (1999). Modern Information Retrieval. Pearson-Addison Wesley.
    • (2008). Instituto Colombiano de Cultura Hispánica Geografía Humana de Colombia. Región Andina Central.
    • Rivet, P. (1913). Les familles linguistiques du Nord-Ouest de l'Amérique du Sud en Année Linguistique (Société Philologique). Journal...
    • Greenberg, J. (1987). Language in the Americas. Stanford University Press.
    • Loukotka, C. (1968). Classification of South American Indian Languages. Latin American Studies Center, University of California.
    • Constenla, A. (1993). La Familia Chibcha en Estado Actual de la Clasificación de las Lenguas Indígenas de Colombia. 75-125
    • Landaburu, J. (2000). Clasificación de las lenguas indígenas de Colombia. Lenguas Indígenas de Colombia: una visión descriptiva. 25-48
    • Jung, I. (1984). Descripción de una Lengua Indígena de Colombia. LINOM GmbH.
    • Rojas, T. (1998). La Lengua páez. Ministerio de Cultura. Bogotá.
    • Xiao, R. (2010). Handbook of Natural Language Processing. CRC Press.
    • Dinakaramani, A.,Rashel, F.,Luthfi, A.,Manurung, R.. (2014). Designing an Indonesian Part of speech Tagset and Manually Tagged Indonesian...
    • Ismail, S.,Rahman, M.,Al-Mumin, M.. (2014). Developing an Automated Bangla Parts Of Speech. 16th International Conference on Computer and...
    • Petrov, S.,Das, D.,McDonald, R.. (2012). A Universal Part-of-Speech Tagset. 8th International Conference on Language Resources and Evaluation...
    • Baskaran, S.,Bali, K.,Bhattacharya, T.,Bhattacharyya, P.,Choudhury, M.,Nath Jha, G.. (2008). KVS Subbarao. A Common Parts-of-Speech Tagset...
    • (1996). Expert Advisory Group on Language Engineering Standards. Recommendations for the Morphosyntatic Annotation of Corpora.
    • Rabbi, I.,Abid-Khan, M.,Ali, R.. (2008). Developing a Tagset for Pashto Part of Speech Tagging. Second International Conference on Electrical...
    • Scherrer, Y.,Nerima, L.,Russo, L.,Ivanova, M.,Wehrli, E.. (2014). SwissAdmin: A multilingual tagged parallel corpus of press releases. Ninth...
    • Ariaratnam, I.,Weerasinghe, A.,Liyanage, C.. (2014). A shallow parser for Tamil. International Conference on Advances in ICT for Emerging...
    • Singh, S.,Banerjee, E.. (2014). Annotating Bhojpuri Corpus using BIS Scheme. 2nd Workshop on Indian Language Data: Resources and Evaluation...
    • Spoustová, J.,Spousta, M.. (2012). A High-Quality Web Corpus of Czech. Eight International Conference on Language Resources and Evaluation...
    • Ahmed-Mahar, J.,Qadir-Memon, G.. (2010). Rule Based Part of Speech Tagging of Shindi Language. International Conference on Signal Acquisition.
    • Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. Tenth Machine Translation Summit (MT Summit XX).
    • Marcus, M. P.,Marcinkiewicz, M. A.,Santorini, B.. (1993). Building a large annotated corpus of English: the penn treebank. Journal Computational...
    • Francis, W.,Kucera, H.. (1979). Brown Corpus.
    • Forsati, R.,Shamsfard, M.. (2015). Novel harmony search-based algorithms for part-of-speech tagging. Knowledge and Information Systems. 42....
    • Brill, E. (1992). A simple rule-based part of speech tagger. third conference on Applied natural language processing (ANLC'92).
    • Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A Case Study in Part-of-Speech Tagging. Computational...
    • AlSuhaibani, R.,Newman, C.,Collard, M.,Maletic, J.. (2015). Heuristic-Based Part-of-Speech Tagging of Source Code Identifiers and Comments....
    • Mall, S.,Jaiswal, U.. (2015). Innovative Algorithms for Parts of Speech Tagging in Hindi-English Machine. Green Computing and Internet of...
    • Alba, E.,Luque, G.,Araujo, L.. (2006). Natural language tagging with genetic algorithms. Information Processing Letters. 100. 173
    • Brants, T. (2000). TnT - a statistical part-of-speech tagger. sixth conference on Applied natural language processing (ANLC'00).
    • Lafferty, J.,McCallum, A.,Pereira, F. C.. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data....
    • Keyaki, A.,Miyazaki, J.. (2017). Part-of-speech tagging for web search queries using a large-scale web corpus. Symposium on Applied Computing.
    • Zhonglin, Y.,Zhen, J.,Huang, J.,Hongfeng, Y.. (2016). Part-of-Speech Tagging based on Dictionary and Statistical Machine Learning. 35th Chinese...
    • Albared, M.,Al-Moslmi, T.,Omar, N.,Al-Shabi, A.,Ba-Alwi, F. M.. (2016). Probabilistic Arabic Part of Speech Tagger with Unknown Words Handling....
    • Sun, W.,Wan, X.. (2016). Towards Accurate and Efficient Chinese Part-of-Speech Tagging. Computational Linguistics. 42. 391-419
    • Schmid, H. (1994). Part-of-speech tagging with neural networks. 15th conference on computational linguistics.
    • Nakamura, M.,Shikano, K.. (1989). A study of English word category prediction based on neutral networks, Acoustics, Speech, and Signal. 2....
    • Hnin, H.,Pa-Pa, W.,Thu, Y.. (2017). Back-Propagation Neural Network Approach to Myanmar Part-of-Speech Tagging. Advances in Intelligent Systems...
    • Kabir, F.,Abdullah-Al-Mamun, K.,Nurul Huda, M.. (2016). Deep learning based parts of speech tagger for Bengali. 5th International Conference...
    • Carneiro, H.,França, F. M.,Lima, P. M.. (2015). Multilingual part-of-speech tagging with weightless neural networks. Neural Networks. 66....
    • Lv, C.,Liu, H.,Dong, Y.,Li, F.,Liang, Y.. (2017). Using Uniform-Design GEP for Part-of-Speech Tagging. Journal of Circuits, Systems and Computers....
    • Forsati, R.,Shamsfard, M.. (2012). Cooperation of Evolutionary and Statistical PoS-tagging. 16th CSI International Symposium on Artificial...
    • Silva, A. P.,Silva, A.,Rodríguez, I.. (2014). Part-of-Speech Tagging Using Evolutionary Computation. Nature Inspired Cooperative Strategies...
    • Forsati, R.,Shamsfard, M.,Mojtahedpour, P.. (2010). An Efficient Meta Heuristic Algorithm for POS-Tagging. Fifth International Multi-Conference...
    • Sierra-Martínez, L. M.,Cobos, C.,Corrales, J. C.. (2017). Memetic Algorithm Based on Global-Best Harmony Search and Hill Climbing for Part...
    • Pratt, K. S. (2009). Design Patterns for Research Methods: Iterative Field Research.
    • Omran, M.,Mahdavi, M.. (2008). Global Best Harmony Search. Applied Mathematics and Computation. 198. 643
    • Eberhart, R.,Kennedy, J.. (1995). A new optimizer using particle swarm theory. Proceedings of the Sixth International Symposium on Micromachine...
    • Forsati, R.,Shamsfard, M.. (2014). Hybrid PoS-tagging: A cooperation of evolutionary and statistical approaches. Applied Mathematical Modelling....
Los metadatos del artículo han sido obtenidos de SciELO México

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno