Ir al contenido

Documat


Resumen de The use of context vectors for word sense disambiguation within the ELDIT Dictionary

Kateryna Ignatova, Andrea Abel

  • The aim of this paper is to tackle the problem of Word Sense Disambiguation (WSD) within the ELDIT system. ELDIT (Elektronisches Lernwörterbuch Deutsch-Italienisch) is an online dictionary of German and Italian, as well as a web-based language-learning system targeted at language learners at elementary and intermediate level. In ELDIT, each word is linked with the corresponding dictionary entry with a list of senses. Nevertheless, selecting the suitable sense of a polysemous word as well as choosing the appropriate homonym in the lookup process is not a trivial task, especially for language learners at elementary level.

    Therefore, it is desirable to make the dictionary work easier by automatically selecting the right sense of a word in a given context, which is a Word Sense Disambiguation task. While WSD has been studied intensively in fields such as Information Retrieval (IR), Machine Translation (MT), Question Answering (QA), etc., we present a novel setting, in which WSD is performed within an integrated dictionary system. For performing WSD, we first utilize different kinds of knowledge contained in the ELDIT dictionary, namely part of speech information, morphological knowledge, collocation patterns, and various example sentences as the basis for the context vectors technique. Besides, when the ELDIT dictionary does not provide sufficient data for building a context vector for a word, we fall back upon the vast Internet knowledge. By combining all these sources of information, the implemented module is able to automatically choose the most appropriate meaning of a word in a particular context. It achieves an average precision of 96% for disambiguating Italian and 93% for disambiguating German homonyms. The results for polysemous words greatly depend on how distinct the senses are and how many senses a word has. The evaluation, however, has shown that the approach we apply always outperforms the baseline systemnamely, a simplified Lesk algorithm-and gives quite promising results. In addition to that, we show that the data obtained during our work can be re-used in a number of interesting tasks to serve the further improvement of the ELDIT system.


Fundación Dialnet

Mi Documat