Detailed Error Annotation for Morphologically Rich Languages: Latvian Use Case

Roberts Darģis ^[1] ; Ilze Auzin̦a ^[1] ; Kristīne Levāne-Petrova ^[1] ; Inga Kaija ^[1]
1. [1] Institute of Mathematics and Computer Science, University of Latvia R¯ıga Stradin ,š University
Localización: Human Language Technologies – The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT 2020 / coord. por Andrius Utka, Jurgita Vaičenonienė, Jolanta Kovalevskaitė, Danguolė Kalinauskaitė, 2024, ISBN 978-1-64368-116-0, págs. 241-244
Idioma: inglés
Enlaces
- Texto completo
Resumen
- This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation schema, because error annotated texts are written by beginner level (A1 and A2) whousesimple syntactic structures. This schema focuses on in-depth categorization of spelling and word formation errors. The annotation schema will work best for languages with relatively free word order and rich morphology.