Spelling normalization of historical documents by using a machine translation approach

Autores: Miguel Domingo, Francisco Casacuberta Nolla
Localización: Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain / coord. por Juan Antonio Pérez Ortiz , Felipe Sánchez Martínez , Miquel Esplà Gomis, Maja Popovic, Celia Rico Pérez , André Martins, Joachim Van den Bogaert, Mikel L. Forcada Zubizarreta , 2018, ISBN 978-84-09-01901-4, págs. 129-137
Idioma: inglés
Enlaces
- Texto completo
Resumen
- The lack of a spelling convention in historical documents makes their orthography to change depending on the author and the time period in which each document was written. This represents a problem for the preservation of the cultural heritage, which strives to create a digital text version of a historical document. With the aim of solving this problem, we propose three approaches—based on statistical, neural and character-based machine translation— to adapt the document’s spelling to modern standards. We tested these approaches in different scenarios, obtaining very encouraging results.