A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning

Marc Franco-Salvador

Ayuda

A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning

Autores: Marc Franco-Salvador
Directores de la Tesis: Paolo Rosso (dir. tes.)
Lectura: En la Universitat Politècnica de València ( España ) en 2017
Idioma: español
Tribunal Calificador de la Tesis: Simone Paolo Ponzetto (presid.) , Nicola Ferro (secret.) , Bernardo Magnini (voc.)
Enlaces
- Tesis en acceso abierto en: RiuNet
Resumen
- Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human languages. One of its most challenging aspects involves enabling computers to derive meaning from human natural language. To do so, several meaning or context representations have been proposed with competitive performance. However, these representations still have room for improvement when working in a cross-domain or cross-language scenario.
  
  In this thesis we study the use of knowledge graphs as a cross-domain and cross-language representation of text and its meaning. A knowledge graph is a graph that expands and relates the original concepts belonging to a set of words. We obtain its characteristics using a wide-coverage multilingual semantic network as knowledge base. This allows to have a language coverage of hundreds of languages and millions human-general and -specific concepts.
  
  As starting point of our research we employ knowledge graph-based features - along with other traditional ones and meta-learning - for the NLP task of single- and cross-domain polarity classification. The analysis and conclusions of that work provide evidence that knowledge graphs capture meaning in a domain-independent way.
  
  The next part of our research takes advantage of the multilingual semantic network and focuses on cross-language Information Retrieval (IR) tasks. First, we propose a fully knowledge graph-based model of similarity analysis for cross-language plagiarism detection. Next, we improve that model to cover out-of-vocabulary words and verbal tenses and apply it to cross-language document retrieval, categorisation, and plagiarism detection. Finally, we study the use of knowledge graphs for the NLP tasks of community questions answering, native language identification, and language variety identification.
  
  The contributions of this thesis manifest the potential of knowledge graphs as a cross-domain and cross-language representation of text and its meaning for NLP and IR tasks. These contributions have been published in several international conferences and journals.