An HMM-based PoS tagger for Old Church Slavonic Cover Image

An HMM-based PoS tagger for Old Church Slavonic
An HMM-based PoS tagger for Old Church Slavonic

Author(s): Olga Lyashevskaya, Ilia Afanasev
Subject(s): Language and Literature Studies, Applied Linguistics
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: HMM tagger; Old Church Slavonic; PoS tagging; hybrid models;Universal Dependencies;

Summary/Abstract: We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of-domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added linguistic rules for specific cases such as punctuation and digits. While the model achieves a rather non-impressive accuracy of 81% in in-domain settings, we observe an accuracy of 51% in out-of-domain evaluation, which is comparable to the results of large neural architectures based on pre-trained contextual embeddings.

  • Issue Year: 72/2021
  • Issue No: 2
  • Page Range: 556-567
  • Page Count: 12
  • Language: English