Ir al contenido

Documat


On Evaluating the Contribution of Text Normalisation Techniques to Sentiment Analysis on Informal Web 2.0 Texts

  • Autores: Alejandro Mosquera López, Paloma Moreda Pozo Árbol académico, Yoan Gutiérrez Vázquez Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 58, 2017, págs. 29-36
  • Idioma: inglés
  • Títulos paralelos:
    • Evaluación de la Contribución de la Normalización al Análisis de Sentimiento en Textos Informales de la Web 2.0
  • Enlaces
  • Resumen
    • español

      El tipo de lenguaje empleado en las redes sociales suele incluir elementos informales que pueden afectar el rendimiento de las herramientas de procesamiento del lenguaje natural. El uso de técnicas de normalización léxica es una de las opciones que se han estado usando a la hora de tratar contenidos de la Web 2.0. Sin embargo, no todos los textos requieren dicho pre-procesamiento ya que pueden exhibir diferentes niveles de informalidad. En este trabajo exploramos el impacto de aplicar normalización léxica evaluando los resultados de un sistema de análisis del sentimiento antes y después de la normalización. Los resultados de nuestra investigación muestran una mejora de más del 2.6% sobre el F1 para los textos más informales.

    • English

      The writing style used in social media usually contains informal elements that can lower the performance of Natural Language Processing applications. For this reason, text normalisation techniques have drawn a lot of attention recently when dealing with informal content. However, not all the texts present the same level of informality and may not require additional pre-processing steps. Therefore, in this paper we explore the results of applying lexical normalisation applied to a sentiment analysis classification task on Web 2.0 texts, shows more than a 2.6% improvement over average F1 for the most informal data.

  • Referencias bibliográficas
    • Adda, G., M. Adda-decker, J. luc Gauvain, and L. Lamel. 1997. Text normalization and speech recognition in french. In Proc. ESCA Eurospeech...
    • Amigó, E., J. Carrillo de Albornoz, I. Chugur, A. Corujo, J. Gonzalo, T. Mart́ın, E. Meij, M. de Rijke, and D. Spina. 2013. Overview of...
    • Aw, A., M. Zhang, J. Xiao, and J. Su. 2006. A phrase-based statistical model for sms text normalization. Proceedings of the COLING/ACL, pages...
    • Barbosa, L. and J. Feng. 2010. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd International...
    • Bifet, A. and E. Frank. 2010. Sentiment knowledge discovery in twitter streaming data. In Proceedings of the 13th international conference...
    • Ekman, P., 1999. Basic Emotions, pages 45– 60. John Wiley and Sons, Ltd.
    • Esuli, A. and F. Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference...
    • Fang, A. C. and J. Cao. 2009. Adjective density as a text formality characteristic for automatic text classification: A study based on the...
    • Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
    • Gutiérrez, Y., A. González, R. Pérez, J. I. Abreu, A. Fernández Orqúın, A. Mosquera, A. Montoyo, R. Muñoz, and F. Cámara. 2013....
    • Gutiérrez, Y., A. F. Orqúın, A. Montoyo, and S. Vázquez. 2010. Integración de recursos semánticos basados en WordNet. Procesamiento...
    • Gutiérrez, Y., S. Vázquez, and A. Montoyo. 2011. Sentiment classification using semantic features extracted from WordNetbased resources....
    • Han, B. and T. Baldwin. 2011. Lexical normalisation of short text messages: Makn sens a #twitter. In Proceedings of the 49th Annual Meeting...
    • Han, B., P. Cook, and T. Baldwin. 2013. Lexical normalization for social media text. ACM Trans. Intell. Syst. Technol., 4(1):5:1–5:27, February.
    • Heylighen, F. and J.-M. Dewaele. 1999. Formality of language: definition, measurement and behavioral determinants. Technical report, Free...
    • Izquierdo-Bevià, R., A. Suàrez, and G. Rigau. 2007. A proposal of automatic selection of coarse-grained semantic classes for WSD. Procesamiento...
    • Liu, F., F. Weng, B. Wang, and Y. Liu. 2011. Insertion, deletion, or substitution? normalizing text messages without precategorization nor...
    • Magnini, B. and G. Cavaglia. 2000. Integrating subject field codes into WordNet. In LREC. European Language Resources Association.
    • Mosquera, A., E. Lloret, and P. Moreda. 2012. Towards facilitating the accessibility of web 2.0 texts through text normalisation. In Proceedings...
    • Mosquera, A. and P. Moreda. 2012a. A qualitative analysis of informality levels in web 2.0 texts: The facebook case study. In Proceedings...
    • Mosquera, A. and P. Moreda. 2012b. Smile: An informality classification tool for helping to assess quality and credibility in web 2.0 texts....
    • Mosquera, A. and P. Moreda. 2012c. The study of informality as a framework for evaluating the normalisation of web 2.0 texts. In Proceedings...
    • Mosquera, A. and P. Moreda. 2013. Improving web 2.0 opinion mining systems using text normalisation techniques. Recent Advances in Natural...
    • Mukherjee, S., A. Malu, B. A.R., and P. Bhattacharyya. 2012. Twisent: a multistage system for analyzing sentiment in twitter. In Proceedings...
    • Nakov, P., Z. Kozareva, A. Ritter, S. Rosenthal, V. Stoyanov, and T. Wilson. 2013. Semeval-2013 task 2: Sentiment analysis in twitter.
    • Niles, I. and A. Pease. 2003. Mapping WordNet to the SUMO ontology. In Proceedings of the ieee international knowledge engineering conference,...
    • Pak, A. and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the Seventh conference on...
    • Pang, B. and L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings...
    • Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the...
    • Philips, L. 2000. The double metaphone search algorithm. C/C++ Users Journal, 18:38–43, June.
    • Ratcliff, J. W. and D. E. Metzener. 1988. Pattern matching: The gestalt approach. Dr. Dobb’s Journal, 13(7):46–72, July.
    • Santini, M. 2006. Web pages, text types, and linguistic features: Some issues. ICAME Journal, 30.
    • Sidorov, G., S. Miranda-Jiménez, F. ViverosJiménez, A. Gelbukh, N. Castro-Sánchez, F. Velásquez, I. Dı́az-Rangel, S. SuárezGuerra,...
    • Sproat, R., A. W. Black, S. F. Chen, S. Kumar, M. Ostendorf, and C. Richards. 2001. Normalization of non-standard words. Computer Speech and...
    • Taboada, M., J. Brooke, M. Tofiloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for sentiment analysis. Comput. Linguist., 37(2):267–307,...
    • Taboada, M. and J. Grieve. 2004. Analyzing appraisal automatically. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and...
    • Thurlow, C. 2003. Generation txt? the sociolinguistics of young people’s textmessaging. Discourse Analysis Online, 1(1).
    • Turney, P. D. 2002. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the...
    • Vázquez, S., A. Montoyo, and Z. Kozareva. 2007. Word sense disambiguation using extended relevant domains resource. In H. R. Arabnia, M....
    • Wang, P. and H. T. Ng. 2013. A beamsearch decoder for normalization of social media text with application to machine translation. In Proceedings...
    • Wiebe, J., T. Wilson, and C. Cardie. 2005. Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation,...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno