Discursos presidenciales en Uruguay enfoque desde el análisis estadístico de texto

Elena Vernazza; José Luis Vicente Villardón

Ayuda

Discursos presidenciales en Uruguay enfoque desde el análisis estadístico de texto

Vernazza Mañana, Elena ^[1] ; Vicente Villardón, José Luis ^[2]
1. [1] Instituto de Estadística, Facultad de Ciencias Económicas y de Administración, Universidad de la República
2. [2] Departamento de Estadística, Universidad de Salamanca
Localización: Cuadernos del CIMBAGE, ISSN-e 1669-1830, ISSN 1666-5112, Vol. 1, Nº. 23, 2021, págs. 21-46
Idioma: español
DOI: 10.56503/cimbage/vol.1/nro.23(2021)p.21-46
Títulos paralelos:
- Presidential speeches in Uruguay from a textual data analysis approach
Enlaces
- Texto completo (pdf)
Resumen
- español
  El origen del análisis de datos textuales se remonta a los análisis realizados sobre obras literarias, destacándose un recuento de las palabras de la Biblia y el primer catálogo de clasificación de libros de una biblioteca. A principios de 1900 estas ideas se extienden dando lugar a la generación de resúmenes de largos textos, mediante análisis de frecuencias y medidas de significación relativa de cada término dentro de un texto. Desde una perspectiva estadística, el tratamiento de datos textuales se afianza cuando surge el Análisis Factorial de Correspondencias, desarrollado para el trato de datos lingüísticos. Actualmente, el análisis estadístico de textos se ha expendido y ha ido incrementando su implementación, dejando de lado las aplicaciones únicamente a obras literarias. Los principales ámbitos de aplicación son: análisis de mercado, búsquedas web, periodismo, psicología y educación, sociología y politología. Los textos analizados se estructuran de forma conjunta en un único elemento. Así, el análisis estadístico de datos textuales se define como el proceso de extraer información de dicho elemento. En este trabajo se realiza un análisis, desde esta perspectiva, de cuatro discursos presidenciales de Uruguay, correspondientes a Julio María Sanguinetti y Tabaré Vázquez. Ambos provienen de sectores ideológicos/políticos distintos, tradicionalmente opuestos/rivales y cada uno ha formado parte del gobierno como oposición, durante el mandato del otro. Los resultados son de carácter descriptivo multidimensional y se complementan con elementos de visualización. Los principales resultados obtenidos ponen de manifiesto la diferencia que existe en los discursos, a nivel tanto de candidato como de período.
- English
  The textual data analysis origin is based on the analyzes carried out mainly on literary works, which include a recount of all Bible’s words and the presentation of the first catalog of book classification of a library. At the beginning of 1900 these ideas are extended, giving rise to summaries generation of long texts, based on the analysis of frequencies and measures of relative significance of each term within a given text. From a formal statistical perspective, the text data processing takes hold when the Factorial Correspondence Analysis arises, developed for linguistic databases treatment. Now a day, the textual data statistical analysis has been sold and has increased its implementation. Among the main fields of application are: market analysis, web searches and journalistic studies, studies of psychology, education and sociology. The analyzed texts are organized and structured together in a single element. Thus, the textual data statistical analysis is defined as the process of extracting information from that element. This paper analyzes, from this perspective, of four presidential speeches in Uruguay, corresponding to Julio María Sanguinetti and Tabaré Vázquez. Both come from different ideological / political sectors, traditionally opposed / rivals and each has been part of the opposition government, during the mandate of each other. The presented results are multidimensional descriptive and are complemented by visualization elements, a tool typically used in textual data statistical analysis. The main results obtained show the difference that exists in the speeches, both at the level of the candidate and the period.
Referencias bibliográficas
- Aggarwal, C. (2018). Machine Learning for Text. Springer International Publishing AG. https://doi.org/10.1007/978-3-319-73531-3.
- Benzécri, J.P. (1973). L’Analyse Des Dones. II L’Analyse Des Correspondances. Paris. Dunod.
- Fellows, I. (2018). wordcloud: Word Clouds. R package version 2.6. https://CRAN.R-project.org/package=wordcloud.
- Feinerer, I. y Hornik (2018). tm: Text Mining Package. R package version 0.7-6. https://CRAN.R-project.org/package=tm
- Feinerer, I. Hornik, K. y David Meyer (2008). Text Mining Infrastructure in R. Journal of Statistical. Software 25(5): 1-54. URL: http://www.jstatsoft.org/v25/i05/.
- Kwartler, T. (2017). Text Mining in Practice with R. John Wiley & Sons Ltd.
- Leek, J. (2015). The Elements of Data Analytic Style, Leanpub, 2015-03-02.
- R Core Team (2018). R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Viena, Austria. (https://www.R-project.org/).
- Rinker, T. W. (2018). textclean: Text Cleaning Tools version 0.9.3. Buffalo, New York. https://github.com/trinker/textclean.
- Silge, J. y Robinson, D. (2016). tidytext: Text Mining and Analysis Using Tidy Data Principles in R. JOSS, 1(3). doi: 10.21105/joss.00037...
- Silge, J. y Robinson, D. (2017). Text mining with R: A tidy approach. O’Reilly Media, Inc.
- Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1 - 23. doi:http://dx.doi.org/10.18637/jss.v059.i10.
- Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
- Wickham, H. (2017). tidyverse: Easily Install and Load the Tidyverse. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse.
- Wickham, H. (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr.
- Zhai, C. y Massung, S. (2016). Text data management and analysis: a practical introduction to information retrieval and text mining (First...
- Zipf, G.K. (1936). The Psycho-Biology of Language: An Introduction to Dynamic Philology. A statistical study of vocabulary, The Modern Language...