Levels of Annotation in the Bulgarian National Corpus Cover Image

Levels of Annotation in the Bulgarian National Corpus
Levels of Annotation in the Bulgarian National Corpus

Author(s): Sia Kolkovska, Svetla Koeva, Diana Blagoeva
Subject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: korpus referencyjny; język bułgarski; anotacja; tokenizacja; tagowanie; tagset; sensy słów; reference corpus; Bulgarian language; annotation; tokenisation; tagging; tagset; word senses

Summary/Abstract: The paper presents levels of annotation adapted in the Bulgarian National Corpus. The first stage of annotation consisted in dividing the text into tokens (words), it was followed by morphosyntactic and semantic analysis. The morphosyntactic analysis is to a great extent unambiguous, since parts of the corpus have been annotated for the Bulgarian WordNet word senses. Moreover, the BulNC is annotated syntactically with a parser based on a specially constructed right context-sensitive grammar. All the levels of annotation are exploited in the BulNC search engine.

  • Issue Year: 2012
  • Issue No: 63
  • Page Range: 147-154
  • Page Count: 8
  • Language: English