Annotation of Lithuanian Lexemes: Peculiarities and Problems Cover Image

Lietuvių kalbos leksemų morfologinis anotavimas: ypatumai ir sunkumai
Annotation of Lithuanian Lexemes: Peculiarities and Problems

Author(s): Jolanta Vaskelienė, Erika Rimkutė, Vaidotas Valskys
Subject(s): Language and Literature Studies
Published by: Kauno Technologijos Universitetas
Keywords: tekstynas; automatinė morfologinė analizė; morfologinis anotatorius; daiktavardžiai

Summary/Abstract: The article presents the principles of the morphological annotator and the peculiarities of automatic morphological analysis. The paper focuses on building the lexical database of the Lithuanian morphological annotator, which is one of the completed tasks of the project Internet Resources: Annotated Corpus of the Lithuanian Language and Tools of Annotation (ALKA 2), implemented in 2007-2008 and sponsored by the Lithuanian State Science and Studies Foundation. The selection of words to be included into the lexical database of the morphological annotator is described in detail. The stages of morphological annotation and difficulties in this paper are also discussed. The lexical database of the morphological annotator has increased by 24 000 words (mostly proper and common nouns). Therefore it is expected that the quality of the morphological annotator will improve considerably and many unrecognized words will be avoided. The goal of the article is to show the process of annotation. It reveals that problems arise not only during the evaluation of acceptability of new words for the Lithuanian language and the identification of their meanings, but also during their morphological analysis. It is difficult to determine their declension paradigms, gender, number inflection, derivatives, etc.

  • Issue Year: 2009
  • Issue No: 15
  • Page Range: 63-70
  • Page Count: 8
  • Language: Lithuanian