MINTZAI: Sistemas de Aprendizaje Profundo E2E para Traducción Automática del Habla

Thierry Etchegoyhen; Haritz Arzelus; Harritxu Gete; Aitor Alvarez Muniain; Inmaculada Hernáez Rioja; Eva Navas Cordón; Ander González Docasal; Jaime Osácar; Edson Benites; Igor Ellakuria Santos; Eusebi Calonge; Maite Martín Roldán

Ayuda

MINTZAI: Sistemas de Aprendizaje Profundo E2E para Traducción Automática del Habla

Autores: Thierry Etchegoyhen , Haritz Arzelus, Harritxu Gete, Aitor Alvarez Muniain , Inmaculada Hernáez Rioja , Eva Navas Cordón , Ander González Docasal, Jaime Osácar, Edson Benites, Igor Ellakuria Santos, Eusebi Calonge, Maite Martín Roldán
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 65, 2020, págs. 97-100
Idioma: español
Títulos paralelos:
- MINTZAI: End-to-end Deep Learning for Speech Translation
Enlaces
- Texto completo
Resumen
- español
  La traducción automática del habla consiste en traducir el habla de un idioma origen en texto o habla de un idioma destino. Sistemas de este tipo tienen múltiples aplicaciones y son de especial interés en comunidades multilingües como la Unión Europea. El enfoque estándar en el ámbito se basa en componentes principales distintos que encadenan el reconocimiento del habla, la traducción automática, y la síntesis del habla. Con los avances obtenidos mediante redes neuronales artificiales y aprendizaje profundo, la posibilidad de desarrollar sistemas de traducción del habla extremo a extremo (end-to-end), sin descomposición en etapas intermedias, está dando lugar a una fuerte actividad en investigación y desarrollo. En este artículo, se hace un repaso del estado del arte en este área y se presenta el proyecto mintzai, que se está realizando en el ámbito.
- English
  Speech Translation consists in translating speech in one language into text or speech in a different language. These systems have numerous applications, particularly in multilingual communities such as the European Union. The standard approach in the field involves the chaining of separate components for speech recognition, machine translation and speech synthesis. With the advances made possible by artificial neural networks and Deep Learning, training end-to-end speech translation systems has given rise to intense research and development activities in recent times. In this paper, we review the state of the art and describe project mintzai, which is being carried out in this field.
Referencias bibliográficas
- Bahdanau, D., K. Cho, y Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. En Proc. of ICLR.
- Bérard, A., O. Pietquin, C. Servan, y L. Besacier. 2016. Listen and translate: A proof of concept for end-to-end speech-to-text translation....
- Casacuberta, F., H. Ney, F. J. Och, E. Vidal, J. M. Vilar, S. Barrachina, I. GarcıaVarea, D. Llorens, C. Martınez, S. Molau, F. Nevado, M....
- Duong, L., A. Anastasopoulos, D. Chiang, S. Bird, y T. Cohn. 2016. An attentional model for speech translation without transcription. En Proc....
- Graves, A., A.-r. Mohamed, y G. Hinton. 2013. Speech recognition with deep recurrent neural networks. En Proc. of ICASSP, páginas 6645–6649.
- Jia, Y., R. J. Weiss, F. Biadsy, W. Macherey, M. Johnson, Z. Chen, y Y. Wu. 2019. Direct speech-to-speech translation with a sequence-to-sequence...
- Kumar, G., G. Blackwood, J. Trmal, D. Povey, y S. Khudanpur. 2015. A coarsegrained model for optimal coupling of ASR and SMT systems for speech...
- Matusov, E., S. Kanthak, y H. Ney. 2005. On the integration of speech recognition and statistical machine translation. En Proc. of Eurospeech...
- Ney, H. 1999. Speech translation: Coupling of recognition and translation. En Proc. of ICASSP 1999, páginas 517–520.
- Niehues, J., R. Cattoni, S. Stuker, M. Negri, M. Turchi, E. Salesky, R. Sanabria, L. Barrault, L. Specia, y M. Federico. 2019. The IWSLT 2019...
- Vidal, E. 1997. Finite-state speech-to-speech translation. En Proc. of ICASSP, páginas 111–114.
- Wang, Y., R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R....
- Weiss, R. J., J. Chorowski, N. Jaitly, Y. Wu, y Z. Chen. 2017. Sequence-to-sequence models can directly translate foreign speech. arXiv:1703.08581.