Ir al contenido

Documat


Spanish hate-speech detection in football

  • Autores: Gema Alcaraz Mármol, Rafael Valencia García Árbol académico, Esteban Montesinos, Francisco García Sánchez Árbol académico, José Antonio García Díaz
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 71, 2023, págs. 15-27
  • Idioma: inglés
  • Títulos paralelos:
    • Detección de odio en futbol en español
  • Enlaces
  • Resumen
    • español

      En los últimos años, el Procesamiento del Lenguaje Natural (PLN) se ha aplicado con éxito a diversas tareas, como la elaboración de perfiles de autor, la detección de negaciones o la detección de discursos de odio. Para la identificación de odio a partir de texto, es posible explotar modelos del lenguaje preentrenados que permitan construir clasificadores de alto rendimiento utilizando un enfoque de aprendizaje por transferencia (en inglés, transfer learning). En este trabajo, se presentan los resultados de entrenar y evaluar clasificadores preentrenados de última generación basados en Transformers. Los modelos explorados se ajustan (en inglés, fine tune) utilizando un corpus en español sobre el discurso de odio en el futbol que se ha compilado como parte de esta investigación. El corpus contiene un total de 7.483 tuits relacionados con el futbol que han sido anotados manualmente bajo cuatro categorías: agresivo, racista, misógino y seguro. Se utilizó un enfoque multietiqueta, que permite etiquetar el mismo tuit con más de una clase. Los mejores resultados, con un macro F1-score del 88,713%, se han obtenido mediante una combinación de los modelos utilizando la estrategia de Knowledge Integration.

    • English

      In the last few years, Natural Language Processing (NLP) tools have been successfully applied to a number of different tasks, including author profiling, negation detection or hate speech detection, to name but a few. For the identification of hate speech from text, pre-trained language models can be leveraged to build high-performing classifiers using a transfer learning approach. In this work, we train and evaluate state-of-the-art pre-trained classifiers based on Transformers. The explored models are fine-tuned using a hate speech corpus in Spanish that has been compiled as part of this research. The corpus contains a total of 7,483 football-related tweets that have been manually annotated under four categories: aggressive, racist, misogynist, and safe. A multi-label approach is used, allowing the same tweet to be labeled with more than one class. The best results, with a macro F1-score of 88.713%, have been obtained by a combination of the models using Knowledge Integration.

  • Referencias bibliográficas
    • Ali, R., U. Farooq, U. Arshad, W. Shahzad, and M. O. Beg. 2022. Hate speech detection on twitter using transfer learning Computer Speech &...
    • Alkomah, F. and X. Ma. 2022. A literature review of textual hate speech detection methods and datasets. Information, 13(6):273
    • Arango, A., J. Pérez, and B. Poblete. 2022 Hate speech detection is not as easy as you may think: A closer look at model validation (extended...
    • Bilal, M., A. Khan, S. Jan, and S. Musa 2022. Context-aware deep learning model for detection of roman urdu hate speech on social media platform....
    • Cañete, J., G. Chaperon, R. Fuentes, J.-H Ho, H. Kang, and J. Pérez. 2020. Spanish pre-trained bert model and evaluation data. In PML4DC at...
    • Cañete, J., S. Donoso, F. Bravo-Márquez, A. Carvallo, and V. Araujo. 2022 ALBETO and DistilBETO: Lightweight spanish language models. In N....
    • Chiril, P., E. W. Pamungkas, F. Benamara, V. Moriceau, and V. Patti. 2022. Emotionally informed hate speech detection: A multi-target perspective....
    • Cleland, J. 2014. Racism, football fans, and online message boards: How social media has added a new dimension to racist discourse in english...
    • Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised...
    • de la Rosa, J., E. G. Ponferrada, M. Romero, P. Villegas, P. González de Prado Salas, and M. Grandury. 2022. BERTIN: efficient pre-training...
    • Devlin, J., M. Chang, K. Lee, and K. Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In...
    • García Días, J. A., F. García Sánchez, and R. Valencia-García. 2023. Smart analysis of economics sentiment in spanish based on linguistic...
    • García Díaz, J. A., P. J. Vivancos-Vicente, ´A. Almela, and R. Valencia-García. 2022 UMUTextStats: A linguistic feature extraction tool for...
    • García Díaz, J. A., R. Colomo-Palacios, and R. Valencia-García. 2022. Psychographic traits identification based on political ideology: An...
    • García Díaz, J. A., M. Cánovas García, R. Colomo-Palacios, and R. Valencia- García. 2021. Detecting misogyny in spanish tweets. An approach...
    • Gutiérrez Fandiño, A., J. Armengol-Estapé, M. Pamies, J. Llop-Palao, J. Silveira- Ocampo, C. P. Carrino, C. Armentano- Oller, C. R. Penagos,...
    • He, P., J. Gao, and W. Chen. 2021 DeBERTaV3: Improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing...
    • Husain, F. and O. Uzuner. 2022. Investigating the effect of preprocessing arabic text on offensive language and hate speech detection ACM...
    • Mansur, Z., N. Omar, and S. Tiun. 2023 Twitter hate speech detection: A systematic review of methods, taxonomy analysis, challenges, and opportunities....
    • Mathew, B., R. Dutt, P. Goyal, and A. Mukherjee. 2019. Spread of hate speech in online social media. In Proceedings of the 10th ACM Conference...
    • Mehta, H. and K. Passi. 2022. Social media hate speech detection using explainable artificial intelligence (XAI). Algorithms, 15(8):291
    • Min, B., H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth. 2021. Recent advances in natural language...
    • Mosca, E., F. Szigeti, S. Tragianni, D. Gallagher, and G. Groh. 2022. Shapbased explanation methods: A review for NLP interpretability. In...
    • Mozafari, M., R. Farahbakhsh, and N. Crespi. 2022. Cross-lingual fewshot hate speech and offensive language detection using meta learning....
    • Oliveira, L. and J. Azevedo. 2022. Using social media categorical reactions as a gateway to identify hate speech in covid-19 news. SN Computer...
    • Omar, M., S. Choi, D. Nyang, and D. Mohaisen 2022. Robust natural language processing: Recent advances, challenges, and future directions....
    • Paz, M. A., J. Montero-Díaz, and A. Moreno- Delgado. 2020. Hate speech: A systematized review. SAGE Open, 10(4):2158244020973022
    • Plaza del Arco, F. M., M. D. Molina- González, L. A. Ureña López, and M. T Martín Valdivia. 2021. Comparing pretrained language models for...
    • Poletto, F., V. Basile, M. Sanguinetti, C. Bosco, and V. Patti. 2021. Resources and benchmark corpora for hate speech detection: a systematic...
    • Reimers, N. and I. Gurevych. 2019 Sentence-BERT: Sentence embeddings using siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, and X. Wan,...
    • Roy, P. K., S. Bhawal, and C. N. Subalalitha 2022. Hate speech and offensive language detection in dravidian languages using deep ensemble...
    • Tausczik, Y. R. and J. W. Pennebaker. 2010 The psychological meaning of words: Liwc and computerized text analysis methods Journal of language...
    • Vasconcelos, M., J. Almeida, P. Cavalin, and C. Pinhanez. 2019. Live it up: Analyzing emotions and language use in tweets during the soccer...
    • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In...
    • Wullach, T., A. Adler, and E. Minkov 2022. Character-level hypernetworks for hate speech detection. Expert Syst. Appl., 205:117571
    • Xue, L., N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel. 2021. mT5: A massively multilingual pre-trained...
    • Zhang, X., Y. Malkov, O. Florez, S. Park, B. McWilliams, J. Han, and A. El-Kishky 2022. TwHIN-BERT: A socially-enriched pre-trained language...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno