This paper describes the implementation of a second-order hidden Markov model (HMM) based part-of-speech tagger for the Apertium free/open-source rule-based machine translation platform. We describe the part-of-speech (PoS) tagging approach in Apertium and how it is parametrised through a tagger definition file that defines: (1) the set of tags to be used and (2) constrain rules that can be used to forbid certain PoS tag sequences, thus re-fining the HMM parameters and increasing its tagging accuracy. The paper also reviews the Baum-Welch algorithm used to estimate the HMM parameters and compares the tagging accuracy achieved with that achieved by the original, first-order HMM-based PoS tagger in Apertium.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados