Ir al contenido

Documat


Itzulpen automatiko gainbegiratu gabea

  • Autores: Mikel Artetxe Zurutuza
  • Directores de la Tesis: Gorka Labaka Intxauspe (dir. tes.) Árbol académico, Eneko Agirre Bengoa (dir. tes.) Árbol académico
  • Lectura: En la Universidad del País Vasco - Euskal Herriko Unibertsitatea ( España ) en 2020
  • Idioma: euskera
  • Tribunal Calificador de la Tesis: Kepa Sarasola Gabiola (presid.) Árbol académico, Pablo Gamallo Otero (secret.) Árbol académico, Cristina España Bonet (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: ADDI
  • Resumen
    • Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research. // Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno