Ir al contenido

Documat


Resumen de Creating the best development corpus for Statistical Machine Translation systems

Mara Chinea Rios, Germán Sanchis Trilles, Francisco Casacuberta Nolla Árbol académico

  • We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged for translating a specific test set, without further data from the domain at hand. Such test set stems from a real application of machine translation, where the texts of a specific e-commerce were to be translated. For developing our development-set selection techniques, we first conducted experiments in a controlled scenario, where labelled data from different domains was available, and evaluated the techniques both with classification and translation quality metrics. Then, the best-performing techniques were evaluated on the e-commerce data at hand, yielding consistent improvements across two language directions.


Fundación Dialnet

Mi Documat