Clasificación multiclase y visualización de quejas de organismos oficiales en twitter

Beatriz Hernández Pajares; Diana Pérez Marín; Vanessa Frías Martínez

Ayuda

Clasificación multiclase y visualización de quejas de organismos oficiales en twitter

Hernández-Pajares, Beatriz ^[2] ; Pérez-Marín, Diana ^[1] ; Frías-Martínez, Vanessa ^[3]
1. [1] Universidad Rey Juan Carlos
  
  Universidad Rey Juan Carlos
  
  Madrid, España
2. [2] Centro de Inteligencia Artificial, Wavespace, España
3. [3] Universidad de Maryland, Estados Unidos
Mostrar afiliaciones +
Localización: TecnoLógicas, ISSN-e 2256-5337, ISSN 0123-7799, Nº. 47, 2020, págs. 109-120
Idioma: español
DOI: 10.22430/22565337.1454
Títulos paralelos:
- Visualization and Multiclass Classification of Complaints to Official Organisms on Twitter
Enlaces
- Texto completo (pdf)
Resumen
- español
  Las redes sociales acumulan gran cantidad de información. Las actuales técnicas de Procesamiento de Lenguaje Natural permiten su procesamiento automático y las técnicas de Minería de Datos permiten extraer datos útiles a partir de la información recopilada y procesada. Sin embargo, de la revisión del estado del arte, se observa que la mayoría de los métodos de clasificación de los datos identificados y extraídos de redes sociales son biclase. Esto no es suficiente para algunas áreas de clasificación, en las que hay más de dos clases a considerar. En este artículo, se aporta un estudio comparativo de los métodos svm y Random Forests, para la identificación automática de n-clases en microblogging de redes sociales. Los datos recopilados automáticamente para el estudio están conformados por 190 000 tweets de cuatro organismos oficiales: Metro, Protección Civil, Policía, y Gobierno de México. De los resultados obtenidos, se recomienda el uso de Random Forests, ya que se consigue una precisión media del 81.46 % y una cobertura media del 59.88 %, con nueve tipos de quejas identificadas automáticamente.
- English
  Social networks generate massive amounts of information. Current Natural Language techniques allow the automatic processing of that information, and Data Mining enables the automatic extraction of useful info. However, a state-of-the-art review reveals that many classification methods only distinguish two classes. This paper presents a procedure to automatically classify tweets into several classes (more than two). The steps of the procedure are described in detail so that any researcher can follow them. The accuracy and coverage (instead of only coverage as usual in the literature) of two automatic classifiers (SVM and Random Forests) were analyzed in a comparative study. The procedure was applied to automatically identify more than two types of complaint from 190,000 tweets. According to the results, Random Forests should be used because they achieve an average accuracy of 81.46 % and an average coverage of 59.88 %.
Referencias bibliográficas
- S. Galeano, “Cuáles son las redes sociales con más usuarios del mundo (2019),” M4rketing Ecommerce, 2019. Disponible en: https://marketing4ecommerce.net/cuales-redes-sociales-con-mas-usuarios-mundo-2019-top/,...
- K. Smith, “44 estadísticas de Twitter,” Brandwatch, 2016. Disponible en: URL [Accedido: 27-Jan-2020].
- C. D. Manning y H. Schiitze, Foundations of Statistical Natural Language Processing: Massachusetts Institute of Technology: MIT Press. Cambridge,...
- M. Vallez y R. Pedraza-Jimenez, “El Procesamiento del Lenguaje Natural en la Recuperación de Información Textual y áreas afines,” Hipertext.net,...
- tf-idf, “What does tf-idf mean?”. Disponible en: http://www.tfidf.com/. [Accedido: 27-Jan-2020].
- C. C. Aggarwa y C. Zhai, Mining Text Data: Boston, MA: Springer US, 2012. https://doi.org/10.1007/978-1-4614-3223-4
- Z. Malkani y E. Gillie, “Supervised Multi-Class Classification of Tweets,” pp. 1–6, Dec. 2012. Disponible en: https://pdfs.semanticscholar.org/bc78/1a147a3fe8477ade06ccf22a3aabe12236ea.pdf
- Twitter, “What The Trend,” 2009. Disponible en: https://twitter.com/whatthetrend
- K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal, y A. Choudhary, “Twitter Trending Topic Classification,” en 2011 IEEE 11th...
- Y. Zhu, X. Shen, y W. Pan, “Network-based support vector machine for classification of microarray samples,” BMC Bioinformatics, vol. 10, no...
- J. Ramos, “Using tf-idf to determine word relevance in document queries,” en Proceedings of the first instructional conference on machine...
- I. Rish, “An empirical study of the naive Bayes classifier,” en IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001,...
- E. Anguiano-Hernández, Naive Bayes Multinomial para clasificación de texto usando un esquema de pesado por clases, pp.1-8, Apr. 2009. Disponible...
- N. Cristianini y J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge: University...
- RuleQuest Research “About us,” 2018. Disponible en: https://rulequest.com/about-us.html. [Accedido: 21-Sep-2019].
- B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, y M. Demirbas, “Short text classification in twitter to improve information filtering,”...
- J. Nazura y B. L. Muralidhara, “Semantic classification of tweets: A contextual knowledge based approach for tweet classification,” en 2017...
- P. Selvaperumal y A. Suruliandi, “A short message classification algorithm for tweet classification,” en 2014 International Conference on...
- R. C. Balabantaray, M. Mohammad, y N. Sharma, “Multi-Class Twitter Emotion Classification: A New Approach,” Int. J. Appl. Inf. Syst., vol....
- E. D’Andrea, P. Ducange, A. Bechini, A. Renda, y F. Marcelloni, “Monitoring the public opinion about the vaccination topic from tweets analysis,”...
- M. Habdank, N. Rodehutskors, y R. Koch, “Relevancy assessment of tweets using supervised learning techniques: Mining emergency related tweets...
- J. F. Franco-Bermúdez y W. L. Ruiz-Castañeda, “Análisis de redes sociales para un sistema de innovación generado a partir de un modelo de...
- R. S. Ghaly, E. Elabd, y M. A. Mostafa, “Tweets classification, hashtags suggestion and tweets linking in social semantic web,” en 2016 SAI...
- E. Yar, I. Delibalta, L. Baruh, y S. S. Kozat, “Online text classification for real life tweet analysis,” en 2016 24th Signal Processing and...
- J. M. Rodriguez, D. Godoy, C. Mateos, y A. Zunino, “A multi-core computing approach for large-scale multi-label classification,” Intell. Data...
- Twitter4J.org, “Overview”. Disponible en: http://twitter4j.org/javadoc/index.html
- R. Longadge, S. Dongre y L. Malik, “Class Imbalance Problem in Data Mining Review,” Int. J. Comput. Sci. Netw., vol. 2, no. 1, pp. 83–87,...
- B. Hernández-Pajares, “Clasificación Automática Multiclase de Tweets y su Representación Gráfica,”(Tesis de Maestría), Facultad de ingeniería,...
- B. Hernández-Pajares, D. Pérez-Marín y V. Frías-Martínez, “TFM_code”, 2013. Disponible en: https://tinyurl.com/y4mnwotv.