Riznica: The Croatian Language Corpus Cover Image

Riznica: The Croatian Language Corpus
Riznica: The Croatian Language Corpus

Author(s): Dunja Brozović, Damir Ćavar
Subject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: Korpus Języka Chorwackiego; anotacja; kompatybilność; przetwarzanie danych; standaryzacja; Croatian Language Corpus; annotation; compatibility; data processing; standardisation

Summary/Abstract: The paper describes the research project Riznica, which includes – among others – the creation of a new corpus of the contemporary Croatian (encompassing texts no older than 150 years). The article discusses in detail the technical aspects of corpus development, such as encoding, morphosyntactic annotation and methods of text acquisition. The Croatian Language Corpus developed within the project will enable a multi-level text analysis, its objective being to cover various aspects of language structure, e.g. phonology and syntax. That is why it is essential for its creators to employ appropriate language processing tools and to adapt them for the analysis of the encoded information. The Authors draw attention to the technical processes of designing a uniformed encoding and annotation standards in order to maximize compatibility with other corpora.

  • Issue Year: 2012
  • Issue No: 63
  • Page Range: 051-066
  • Page Count: 16
  • Language: English