Consultas con Errores Ortográficos en RI Multilingüe: análisis y Tratamiento

David Vilares Calvo; Adrián Blanco González; Jesús Vilares

Ayuda

Consultas con Errores Ortográficos en RI Multilingüe: análisis y Tratamiento

Autores: David Vilares Calvo , Adrián Blanco González, Jesús Vilares
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 51, 2013, págs. 25-32
Idioma: español
Enlaces
- Texto completo
Resumen
- español
  Este art´ýculo estudia el impacto de los errores ortogr´aficos en las consultas sobre el rendimiento de los sistemas de recuperaci´on de informaci´on multiling¨ue, proponiendo dos estrategias para su tratamiento: el empleo de t´ecnicas de correcci ´on ortogr´afica autom´atica y la utilizaci´on de n-gramas de caracteres como t´erminos ´ýndice y unidad de traducci´on, para as´ý aprovecharnos de su robustez inherente.
  
  Los resultados demuestran la sensibilidad de estos sistemas frente a dichos errores as´ý como la efectividad de las soluciones propuestas. Hasta donde alcanza nuestro conocimiento no existen trabajos similares en el ´ambito multiling¨ue.
- English
  This paper studies the impact of misspelled queries on the performance of Cross-Language Information Retrieval systems and proposes two strategies for dealing with them: the use of automatic spelling correction techniques and the use of character n-grams both as index terms and translation units, thus allowing to take advantage of their inherent robustness. Our results demonstrate the sensitivity of these systems to such errors and the effectiveness of the proposed solutions. To the best of our knowledge there are no similar jobs in the cross-language field
Referencias bibliográficas
- Bendersky, M. y W.B. Croft. 2009. Analysis of long queries in a large scale search log. En Proc. of WSCD’09, págs. 8–14. ACM.
- Dale, R., H. Moisi, y H. Somers, eds. 2000. Handbook of Natural Language Processing. Marcel Dekker, Inc.
- Di Nunzio, G.M., N. Ferro, T. Mandl, y C. Peters. 2006. CLEF 2006: Ad Hoc Track Overview. En Working Notes of the CLEF 2006 Workshop, págs....
- Graña, J., M.A. Alonso, y M. Vilares. 2002. A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs....
- Graña, J., F.M. Barcala, y J. Vilares. 2002. Formal methods of tokenization for part-of-speech tagging. LNCS, 2276:240–249.
- Guo, J., G. Xu, H. Li, y X. Cheng. 2008. A unified and discriminative model for query refinement. En Proc. of ACM SIGIR’08, págs. 379–386....
- Jansen, B.J., A. Spink, y T. Saracevic. 2000. Real life, real users, and real needs: a study and analysis of user queries on the web. Information...
- Koehn, P. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. En Proc. of MT Summit X, págs. 79–86. Corpus disponible en...
- Koehn, P., F.J. Och, y D. Marcu. 2003. Statistical phrase-based translation. En Proc. of NAACL’03, p´ags. 48–54. ACL.
- Kukich, K. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4):377–439.
- Levenshtein, V.I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklandy, 6:707–710.
- Manning, C.D., P. Raghavan, y H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.
- McNamee, P. y J. Mayfield. 2004a. Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval, 7(1-2):73–97.
- McNamee, P. y J. Mayfield. 2004b. JHU/APL experiments in tokenization and non-word translation. LNCS, 3237:85–97.
- Nie, J.-Y. 2010. Cross-Language Information Retrieval, vol. 8 de Synthesis Lectures on Human Language Technologies. Morgan& Claypool Publishers.
- Och, F.J. y H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51. Herramienta...
- Otero, J., J. Graña, y M. Vilares. 2007. Contextual Spelling Correction. LNCS, 4739:290–296.
- Ounis, I., C. Lioma, C. Macdonald, y V. Plachouras. 2007. Research directions in Terrier: a search engine for advanced retrieval on the web....
- Rehm, G. y H. Uszkoreit, eds. 2011. METANET White Paper Series. Springer. Disponibles en http://www.meta-net.eu/ whitepapers.
- Robertson, A.M. y P. Willett. 1998. Applications of n-grams in textual information systems. Journal of Documentation, 54(1):48–69.
- Savary, A. 2002. Typographical nearestneighbor search in a finite-state lexicon and its application to spelling correction. LNCS, 2494:251–260.
- Véronis, J. 1999. Multext-Corpora. An annotated corpus for five European languages. CD-ROM. Distributed by ELRA/ELDA.
- Vilares, J., M.P. Oakes, y M. Vilares. 2007. A Knowledge-Light Approach to Query Translation in Cross-Language Information Retrieval. En Proc....
- Vilares, M., J. Otero, y J. Graña. 2004. On asymptotic finite-state error repair. LNCS, 3246:271–272.
- Vilares, J., M. Vilares, y J. Otero. 2011. Managing Misspelled Queries in IR Applications. Information Processing & Management, 47(2):263–286.