Feature decay algorithms for neural machine translation

Autores: Alberto Poncelas, Gideon Maillette de Buy Wenniger, Andy Way
Localización: Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain / coord. por Juan Antonio Pérez Ortiz , Felipe Sánchez Martínez , Miquel Esplà Gomis, Maja Popovic, Celia Rico Pérez , André Martins, Joachim Van den Bogaert, Mikel L. Forcada Zubizarreta , 2018, ISBN 978-84-09-01901-4, págs. 239-248
Idioma: inglés
Enlaces
- Texto completo

Resumen
- Neural Machine Translation (NMT) systems require a lot of data to be competitive. For this reason, data selection techniques are used only for fine-tuning systems that have been trained with larger amounts of data. In this work we aim to use Feature Decay Algorithms (FDA) data selection techniques not only to fine-tune a system but also to build a complete system with less data. Our findings reveal that it is possible to find a subset of sentence pairs, that outperforms by 1.11 BLEU points the full training corpus, when used for training a German-English NMT system.