Continual learning in neural machine translation

Salvador Carrión Ponz

Ayuda

Continual learning in neural machine translation

Autores: Salvador Carrión Ponz
Directores de la Tesis: Jon Ander Gómez (dir. tes.) , Francisco Casacuberta Nolla (dir. tes.)
Lectura: En la Universitat Politècnica de València ( España ) en 2025
Idioma: español
Tribunal Calificador de la Tesis: Joan Andreu Sánchez Peiró (presid.) , María Inés Torres Barañano (secret.) , Felipe Sánchez Martínez (voc.)
Enlaces
- Tesis en acceso abierto en: RiuNet
Resumen
- Neural Machine Translation (NMT) faces significant challenges in adapting to the ever-evolving nature of human language. Traditional NMT models struggle to integrate new vocabulary terms and are prone to forget previously learned information upon learning new ones. This work addresses these challenges by exploring methods to enable Continual Learning (CL) in NMT, focusing on the open-vocabulary problem and the Catastrophic Forgetting (CF) phenomenon.
  
  To tackle the open-vocabulary problem, we introduce quasi-character-level vocabularies, which combine the flexibility of character-level models with the efficiency of higher-level encodings. This approach allows NMT models to represent any word, including unseen or rare ones, without excessively increasing sequence lengths, thereby improving model generalization and performance. Additionally, we propose incremental and continual vocabularies that leverage compositional embeddings to integrate new words seamlessly into our NMT models without the need for retraining.
  
  In addressing the Catastrophic Forgetting problem, we investigate the influence of vocabulary domain and size on the model's retention capabilities. Next, we explore rehearsal strategies, demonstrating that minimal amounts of past data can significantly reduce forgetting during training. Furthermore, to improve knowledge retention without heavily relying on past data, we propose a regularization strategy that combines few-shot rehearsal with loss penalties to balance the learning of new tasks while preserving performance on previous ones.
  
  Finally, we explore parameter-efficient adaptation methods to enable effective task-switching strategies in NMT. From this, we discover that by learning a minimal number of parameters, these methods allow NMT models to adapt to new domains, styles, or even languages without a substantial computational overhead or performance degradation on prior tasks. To complement this research, we also derive a gradient-based regularization technique for low-rank matrices that facilitates the integration of new knowledge while mitigating the CF problem.
  
  Overall, this work advances the field of continual learning in NMT by providing practical and thoroughly validated solutions to the open-vocabulary and catastrophic forgetting problems, paving the way for more adaptive and efficient NMT systems capable of responding to the ever-evolving demands of natural language translation tasks.