Lemmatization of the DIA1900 Diachronic Corpus Cover Image

Lemmatization of the DIA1900 Diachronic Corpus
Lemmatization of the DIA1900 Diachronic Corpus

Author(s): Lucie Benešová, Klára Pivoňková, Martin Stluka
Subject(s): Language and Literature Studies, Theoretical Linguistics, Applied Linguistics, Morphology
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: Czech; diachronic corpus; disambiguation; lemmatization; morphological dictionary; variability

Summary/Abstract: This paper focuses on the process of lemmatization of the upcoming Czech diachronic corpus of the second half of the 19th century, DIA1900. The article describes different approaches to the corpus lemmatization of synchronic written, spoken and diachronic corpora within the Czech National Corpus project, including single- and multilevel lemmatization and available tools used to link the variants.

  • Issue Year: 74/2023
  • Issue No: 1
  • Page Range: 275-284
  • Page Count: 10
  • Language: English