Lithuanian morphologically annotated corpus and tree bank Cover Image

Lietuvių kalbos morfologiškai ir sintaksiškai anotuoti tekstynai
Lithuanian morphologically annotated corpus and tree bank

Author(s): Erika Rimkutė, Loïc Boizou, Agnė Bielinskienė
Subject(s): Theoretical Linguistics, Morphology, Syntax
Published by: Lietuvių Kalbos Institutas
Keywords: corpus; automatic morphological analysis; automatic syntactic analysis; tree bank; language technologies;

Summary/Abstract: Annotated corpora are fundamental resources, which are very useful to develop language technology. The size, quality, and structure of such annotated corpora has a direct influence on the development of other tools. This article describes two annotated corpora prepared by the Centre of Computational Linguistics at Vytautas Magnus University: MATAS, a morphologically annotated corpus, and ALKSNIS, a tree bank. It mainly discusses the structure and the tag set of both corpora,as well as the annotation procedure and tools. Both corpora are available online through ANNIS interface, therefore the syntax of ANNIS simple and complex requests is summarized for the Lithuanian potential users.

  • Issue Year: 2017
  • Issue No: 90
  • Page Range: 1-30
  • Page Count: 30
  • Language: Lithuanian