Ir al contenido

Documat


Attention-based approaches for text analytics in social media and automatic summarization

  • Autores: José Ángel González Barba
  • Directores de la Tesis: Emilio Sanchís Arnal (dir. tes.) Árbol académico, Lluís Felip Hurtado Oliver (dir. tes.) Árbol académico
  • Lectura: En la Universitat Politècnica de València ( España ) en 2021
  • Idioma: español
  • Tribunal Calificador de la Tesis: María Teresa Martín Valdivia (presid.) Árbol académico, Vicente J. Botti Navarro (secret.) Árbol académico, Davide Buscaldi (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: RiuNet
  • Resumen
    • Nowadays, society has access, and the possibility to contribute, to large amounts of the content present on the internet, such as social networks, online newspapers, forums, blogs, or multimedia content platforms. These platforms have had, during the last years, an overwhelming impact on the daily life of individuals and organizations, becoming the predominant ways for sharing, discussing, and analyzing online content. Therefore, it is very interesting to work with these platforms, from different points of view, under the umbrella of Natural Language Processing. In this thesis, we focus on two broad areas inside this field, applied to analyze online content: text analytics in social media and automatic summarization. Neural networks are also a central topic in this thesis, where all the experimentation has been performed by using deep learning approaches, mainly based on attention mechanisms. Besides, we mostly work with the Spanish language, due to it is an interesting and underexplored language with a great interest in the research projects we participated in.

      On the one hand, for text analytics in social media, we focused on affective analysis tasks, including sentiment analysis and emotion detection, along with the analysis of the irony. In this regard, an approach based on Transformer Encoders, based on contextualizing pretrained Spanish word embeddings from Twitter, to address sentiment analysis and irony detection tasks, is presented. We also propose the use of evaluation metrics as loss functions, in order to train neural networks for reducing the impact of the class imbalance in multi-class and multi-label emotion detection tasks. Additionally, a specialization of BERT both for the Spanish language and the Twitter domain, that takes into account inter-sentence coherence in Twitter conversation flows, is presented. The performance of all these approaches has been tested with different corpora, from several reference evaluation benchmarks, showing very competitive results in all the tasks addressed.

      On the other hand, we focused on extractive summarization of news articles and TV talk shows. Regarding the summarization of news articles, a theoretical framework for extractive summarization, based on siamese hierarchical networks with attention mechanisms, is presented. Also, we present two instantiations of this framework: Siamese Hierarchical Attention Networks and Siamese Hierarchical Transformer Encoders. These systems were evaluated on the CNN/DailyMail and the NewsRoom corpora, obtaining competitive results in comparison to other contemporary extractive approaches. Concerning the TV talk shows, we proposed a text summarization task, for summarizing the transcribed interventions of the speakers, about a given topic, in the Spanish TV talk shows of the ``La Noche en 24 Horas" program. In addition, a corpus of news articles, collected from several Spanish online newspapers, is proposed, in order to study the domain transferability of siamese hierarchical approaches, between news articles and interventions of debate participants. This approach shows better results than other extractive techniques, along with a very promising domain transferability.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno