Parallel Corpora within the Russian National Corpus Cover Image

Parallel Corpora within the Russian National Corpus
Parallel Corpora within the Russian National Corpus

Author(s): Dmitri Sitchinava
Subject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: korpusy równoległe; wyrównywanie; tagowanie morfosyntaktyczne; tagset; parallel corpora; alignment; morpho-syntactic tagging; tagset

Summary/Abstract: The paper presents parallel corpora within the Russian National Corpus. Attention has been paid to the text alignment principles, and a number of available tools serving this purpose (e.g. LeoBilingua or HunAlign) have been characterized and evaluated. Morphological tagging of texts in languages whose grammatical categories differ is described. Moreover, the author enumerates the existing parallel corpora within the RNC and specifies the plans for expanding the project (which is far from being accomplished). Finally, corpora-based research are exemplified.

  • Issue Year: 2012
  • Issue No: 63
  • Page Range: 271-278
  • Page Count: 8
  • Language: English