Ir al contenido

Documat


Distância diacrónica automática entre variantes diatópicas do português e do espanhol

  • Pichel, José Ramom [3] ; Gamallo, Pablo [1] Árbol académico ; Neves, Marco [2] ; Alegria, Iñaki [4] Árbol académico
    1. [1] Universidade de Santiago de Compostela

      Universidade de Santiago de Compostela

      Santiago de Compostela, España

    2. [2] Universidade Nova de Lisboa

      Universidade Nova de Lisboa

      Socorro, Portugal

    3. [3] imaxin software
    4. [4] Universidade do País Basco (EHU/UPV)
  • Localización: Linguamática, ISSN 1647-0818, Vol. 12, Nº. 1, 2020, págs. 117-126
  • Idioma: portugués
  • DOI: 10.21814/lm.12.1.319
  • Títulos paralelos:
    • Automatic diachronic distance between diatopic variants of Portuguese and Spanish
  • Enlaces
  • Resumen
    • English

      The objective of this work is to apply a perplexity-based methodology to automatically calculate the cross-lingual distance between different historical periods of diatopic language variants. This methodology applies to an adhoc constructed corpus in original spelling, on a balanced basis of fiction and non-fiction, which measures the historical distance between European and Brazilian Portuguese on the one hand, and European and Argentinian Spanish on the other. The results show very close distances, both in original spelling and automatically transcribed spelling, between the diatopic varieties of Portuguese and Spanish, with slight convergences/divergences from the middle of the 20th century until today. It should be noted that the method is not supervised and can be applied to other diatopic varieties of languages.

    • português

      O objetivo deste trabalho é aplicar uma metodologia baseada na perplexidade, para calcular automaticamente a distância interlinguística entre diferentes períodos históricos de variantes diatópicas de idiomas. Esta metodologia aplica-se a um corpus construído adhoc em ortografia original, numa base equilibrada de ficção e não-ficção, que mede a distância histórica entre o português europeu e do Brasil, por um lado, e o espanhol europeu e o da Argentina, por outro. Os resultados mostram distâncias muito próximas em ortografia original e transcrita automaticamente, entre as variedades diatópicas do português e do espanhol, com ligeiras convergências/divergências desde meados do século XX até hoje. É de salientar que o método não é supervisionado e pode ser aplicado a outras variedades diatópicas de línguas.

  • Referencias bibliográficas
    • Asgari, Ehsaneddin & Mohammad R. K. Mo-frad. 2016.Comparing fifty natural lan-guages and twelve genetic languages using...
    • Bakker, Dik, Andre Muller, Viveka Velupillai, Soren Wichmann, Cecil H. Brown, Pa-mela Brown, Dmitry Egorov, Robert ...
    • Barbançon, François, Steven N. Evans, Luay Nakhleh, Don Ringe & Tandy Warnow.2013.An experimental study comparing...
    • Bello, Andrés. 1984.Gramática de la lengua castellana. EDAF.
    • Bello, Andrés et al. 1951.Gramatica: gramática de la lengua castellana destinada al uso de los americanos. Caracas: Ministerio...
    • Biber, Douglas. 1993. Representativeness in corpus design. Literary and linguistic Computing 8(4). 243–257.10.1093/llc/8.4.243.
    • Brown, Cecil H., Eric W. Holman, Søren Wich-mann & Viveka Velupilla. 2009. Automated classification of the world’s languages: a...
    • Cavnar, William B & John M Trenkle. 1994. N-gram-based text categorization. Em 3rd anual symposium on document analysis and information...
    • Chen, Stanley F. & Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. Em 34th Annual Meeting...
    • Chiswick, Barry R. & Paul W. Miller. 2004.Linguistic distance: A quantitative measure of the distance between english and other ...
    • Degaetano-Ortlieb,Stefania, Hannah Kermes, Ashraf Khamis & Elke Teich. 2016. An information-theoretic approach to modeling...
    • Dieguez-Tirado, Javier, Carmen Garcia-Mateo, Laura Docio-Fernandez & Antonio Cardenal-Lopez. 2005. Adaptation strategies forthe...
    • Dunning, Ted. 1994. Statistical identification of language. Computing Research Laboratory, New Mexico State University.
    • Gamallo, Pablo, Inaki Alegria, José Ramom Pichel & Manex Agirrezabal. 2016.Comparing two basic methods for discriminating...
    • Gamallo, Pablo, Marcos Garcia, Susana Sotelo &José Ramom Pichel. 2014. Comparing ranking-based and naive bayes approaches to language...
    • Gamallo, Pablo, José Ramom Pichel & Iñaki Alegria. 2017a. From language identification to language distance.Physica A: ...
    • Gamallo, Pablo, Jose Ramom Pichel, Santiago de Compostela & Inaki Alegria. 2017b. Aperplexity-based method for similar languages...
    • Gao, Yuyang, Wei Liang, Yuming Shi & Qiu-ling Huang. 2014. Comparison of directed and weighted co-occurrence networks...
    • González, Meritxell. 2015. An analysis of Twitter corpora and the differences between formaland colloquial tweets. Em Tweet Translation...
    • Gonzalez-Dominguez, Javier, Ignacio Lopez-Moreno,Ha ̧sim Sak, Joaquin Gonzalez-Rodriguez & Pedro J Moreno. 2014. Automatic language...
    • Han, Aaron Li-Feng, Yi Lu, Derek F Wong, Lidia S Chao, Liangye He & Junwen Xing.2013. Quality estimation for machine translation...
    • Holman, Eric W., Søren Wichmann, Cecil H.Brown, Viveka Velupillai, André Muller & Dik Bakker. 2008. Explorations in automated...
    • Jelinek, Fred, Robert L Mercer, Lalit R Bahl & James K Baker. 1977. Perplexity: a measure of the difficulty of speech recognition tasks.The...
    • Kondrak, Grzegorz. 2005. N-gram similarity and distance. Em International Symposium on String Processing and Information Retrieval(SPIRE),...
    • Kroon, Martin, Masha Medvedeva & Barbara Plank. 2018. When simple n-gram models out-perform syntactic approaches: Discriminating...
    • Liu, HaiTao & Jin Cong. 2013. Language clustering with word co-occurrence networks basedon parallel texts. Chinese Science Bulletin...
    • Lopez-Moreno, Ignacio, Javier Gonzalez-Dominguez, Oldrich Plchot, David Martinez, Joaquin Gonzalez-Rodriguez & Pedro Moreno. 2014....
    • Malmasi, Shervin, Marcos Zampieri, Nikola Ljubesic, Preslav Nakov, Ahmed Ali & Jörg Tiedemann. 2016. Discriminating between...
    • Millar, Robert McColl & Larry Trask. 2015. Trask’s historical linguistics. Abington, UK: Routledge.
    • Nakhleh, Luay, Donald A Ringe & Tandy Warnow. 2005. Perfect phylogenetic networks: Anew methodology for reconstructing the evolutionary...
    • Nerbonne, John & Wilbert Heeringa. 1997. Measuring dialect distance phonetically. Em 3rd Meeting of the ACL Special Interest Group...
    • Petroni, Filippo & Maurizio Serva. 2010. Measures of lexical distance between languages. Physica A: Statistical...
    • Pichel, José Ramom, Pablo Gamallo & Iñaki Alegria. 2018. Measuring language distance among historical varieties using perplexity....
    • Pichel, José Ramom, Pablo Gamallo & Iñaki Alegria. 2019a. Cross-lingual diachronic distance: Application to portuguese and spanish. Procesamiento...
    • Pichel, José Ramom, Pablo Gamallo & Iñaki Alegria. 2019b. Measuring diachronic lan-guage distance using perplexity: Application...
    • Rama, Taraka & Lars Borin. 2015. Comparative evaluation of string similarity measures for automatic language classification. Em Sequences...
    • Rissanen, Matti, Merja Kytö & Minna Palander-Collin. 1993. Early english in the computer age: Explorations through the helsinki corpus....
    • Sennrich, Rico. 2012. Perplexity minimization for translation model domain adaptation in statistical machine translation. Em 13th Conference...
    • Simoes, Alberto, Álvaro Iriarte Sanromán &José Joao Almeida. 2012. Dicionário-aberto: A source of resources for the portuguese...
    • Singh, Anil Kumar & Harshit Surana. 2007. Cancorpus based measures be used for comparative study of languages? Em 9th meeting...
    • Specia, Lucia, Carolina Scarton & Gustavo Henrique Paetzold. 2018. Quality estimation for machine translation. Synthesis Lectures...
    • Swadesh, Morris. 1952. Lexico-statistic dating of prehistoric ethnic contacts: With special reference to north american indians...
    • Tiedemann, Jörg & Nikola Ljubesic. 2012. Efficient discrimination between closely related languages. Em International Conference on Computational...
    • Yujian, Li & Liu Bo. 2007. A normalized levenshtein distance metric. IEEE transactions on pattern analysis and machine...
    • Zampieri, Marcos, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samardzic, Nikola...
    • Zampieri, Marcos, Shervin Malmasi, Yves Scherrer, Tanja Samardzic, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan,Chu-Ren...
    • Zubiaga, Arkaitz, Iñaki San Vicente, Pablo Gamallo, José Ramom Pichel, Iñaki Alegria, Nora Aranberri, Aitzol...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno