Ir al contenido

Documat


Distancia diacrónica interlingüística: aplicación al portugués y el castellano

  • Autores: Pablo Gamallo Otero Árbol académico, Iñaki Alegría Loinaz Árbol académico, José Ramón Pichel Campos
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 63, 2019, págs. 77-84
  • Idioma: español
  • Títulos paralelos:
    • Cross-lingual Diachronic Distance: Application to Portuguese and Spanish
  • Enlaces
  • Resumen
    • español

      El objetivo de este trabajo es establecer una metodología basada en corpus para medir automáticamente la distancia interlingüística entre períodos históricos de dos lenguas mediante perplexity. El corpus de los dos idiomas ha sido construido adhoc con ortografía lo más próxima a la original representando cronológicamente y de forma balanceada ficción y no ficción. Se ha aplicado la metodología a dos lenguas relacionadas, Portugués y Español, y medido sus distancias diacrónicas tanto en ortografía original como en una ortografía transcrita automáticamente.

    • English

      The aim of this paper is to establish a corpus-based methodology for automatically measuring the cross-lingual distance between historical periods of two languages using perplexity. The corpus of both has been constructed adhoc with the closest spelling to the original representing chronologically and in a balanced way fiction and non-fiction. The methodology has been applied to two related languages, Portuguese and Spanish, and measured their diachronic distances both in original orthography and in an automatically transcribed spelling. |

  • Referencias bibliográficas
    • Asgari, E. and M. R. K. Mofrad. 2016. Comparing fifty natural languages and twelve genetic languages using word embedding language divergence...
    • Bakker, D., A. Muller, V. Velupillai, S. Wichmann, C. H. Brown, P. Brown, D. Egorov, R. Mailhammer, A. Grant, and E. W. Holman. 2009. Adding...
    • Barbançon, F., S. Evans, L. Nakhleh, D. Ringe, and T. Warnow. 2013. An experimental study comparing linguistic phylogenetic reconstruction...
    • Biber, D. 1993. Representativeness in corpus design. Literary and linguistic computing, 8(4):243–257.
    • Brown, C. H., E. W. Holman, S. Wichmann, and V. Velupilla. 2008. Automated classification of the world’s languages: a description of the method...
    • Chen, S. F. and J. Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting...
    • Chiswick, B. and P. Miller. 2004. Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages. Discussion...
    • Corredoira, F. V. 1998. A construção da ĺıngua portuguesa frente ao castelhano: o galego como exemplo a contrario.
    • Curell, C. 2006. La influencia del francés en el español contemporáneo. In La cultura del otro: español en Francia, francés en España,...
    • Degaetano-Ortlieb, S., H. Kermes, A. Khamis, and E. Teich. 2016. An information-theoretic approach to modeling diachronic change in scientific...
    • Ellison, T. M. and S. Kirby. 2006. Measuring language divergence by intra-lexical comparison. In Proceedings of the 21st international conference...
    • Galves, C. and P. Faria. 2010. Tycho Brahe parsed corpus of historical Portuguese. URL: http://www. tycho. iel. unicamp. br/˜ tycho/corpus/en/index....
    • Gamallo, P., I. Alegria, J. R. Pichel, and M. Agirrezabal. 2016. Comparing two basic methods for discriminating between similar languages...
    • Gamallo, P., J. R. Pichel, and I. Alegria. 2017. From language identification to language distance. Physica A: Statistical Mechanics and its...
    • Gao, Y., W. Liang, Y. Shi, and Q. Huang. 2014. Comparison of directed and weighted co-occurrence networks of six languages. Physica A: Statistical...
    • González, M. 2015. An analysis of twitter corpora and the differences between formal and colloquial tweets. In Proceedings of the Tweet...
    • Holman, E., S. Wichmann, C. Brown, V. Velupillai, A. Muller, and D. Bakker. 2008. Explorations in automated lexicostatistics. Folia Linguistica,...
    • Liu, H. and J. Cong. 2013. Language clustering with word co-occurrence networks based on parallel texts. Chinese Science Bulletin, 58(10):1139–1144.
    • Malmasi, S., M. Zampieri, N. Ljubešić, P. Nakov, A. Ali, and J. Tiedemann. 2016. Discriminating between similar languages and Arabic dialect...
    • Millar, R. M. and L. Trask. 2015. Trask’s historical linguistics. Routledge.
    • Nakhleh, L., D. A. Ringe, and T. Warnow. 2005. Perfect phylogenetic networks: A new methodology for reconstructing the evolutionary history...
    • Nerbonne, J. and W. Heeringa. 1997. Measuring dialect distance phonetically. In Proceedings of the Third Meeting of the ACL Special Interest...
    • Petroni, F. and M. Serva. 2010. Measures of lexical distance between languages. Physica A: Statistical Mechanics and its Applications, 389(11):2280–2283.
    • Pichel, J. R., P. Gamallo, and I. Alegria. 2018. Measuring language distance among historical varieties using perplexity. application to european...
    • Rama, T., L. Borin, G. Mikros, and J. Macutek. 2015. Comparative evaluation of string similarity measures for automatic language classification.
    • Rama, T. and A. K. Singh. 2009. From bag of languages to family trees from noisy corpus. In Proceedings of the International Conference RANLP-2009,...
    • Rissanen, M. et al. 1993. The helsinki corpus of english texts. Kyttö et. al, pages 73–81.
    • Satterthwaite-Phillips, D. 2011. Phylogenetic Inference of the Tibeto-Burman Languages Or on the Usefulness of Lexicostatistics (and” megalo”-comparison)...
    • Sennrich, R. 2012. Perplexity minimization for translation model domain adaptation in statistical machine translation. In Proceedings of the...
    • Singh, A. K. and H. Surana. 2007. Can corpus based measures be used for comparative study of languages? In Proceedings of ninth meeting of...
    • Swadesh, M. 1952. Lexicostatistic dating of prehistoric ethnic contacts. In Proceedings of the American Philosophical Society 96, pages 452–463.
    • Venâncio, F. 2014. O castelhano como vernáculo português. https://pgl.gal/o-castelhano-como-vernaculo-portugues/
    • Xavier, M. F., M. T. Brocardo, and M. Vincente. 1994. Cipm–um corpus informatizado do português medieval. Actas do X Encontro da Associação...
    • Yujian, L. and L. Bo. 2007. A normalized levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095.
    • Zampieri, M. 2017. Compiling and processing historical and contemporary portuguese corpora. arXiv preprint arXiv:1710.00803.

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno