Ir al contenido

Documat


Creating the best development corpus for Statistical Machine Translation systems

  • Autores: Mara Chinea Rios, Germán Sanchis Trilles, Francisco Casacuberta Nolla Árbol académico
  • Localización: Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain / coord. por Juan Antonio Pérez Ortiz Árbol académico, Felipe Sánchez Martínez Árbol académico, Miquel Esplà Gomis, Maja Popovic, Celia Rico Pérez Árbol académico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada Zubizarreta Árbol académico, 2018, ISBN 978-84-09-01901-4, págs. 99-108
  • Idioma: inglés
  • Enlaces
  • Resumen
    • We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged for translating a specific test set, without further data from the domain at hand. Such test set stems from a real application of machine translation, where the texts of a specific e-commerce were to be translated. For developing our development-set selection techniques, we first conducted experiments in a controlled scenario, where labelled data from different domains was available, and evaluated the techniques both with classification and translation quality metrics. Then, the best-performing techniques were evaluated on the e-commerce data at hand, yielding consistent improvements across two language directions.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno