The diachronic part of the Czech National Corpus: Limitations of corpus research into the history of Czech Cover Image

Diachronní složka Českého národního korpusu a hranice možností korpusového výzkumu vývoje češtiny
The diachronic part of the Czech National Corpus: Limitations of corpus research into the history of Czech

Author(s): Karel Kučera
Subject(s): Language and Literature Studies
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: annotation; corpus size; corpus structure; diachronic corpus; history of Czech

Summary/Abstract: The paper reviews the present state of the diachronic part of the Czech National Corpus, with the focus on the two-million-word unannotated pivotal corpus Diakorp and its limitations in relation to corpus-based research into the history of Czech. A minimum 1,000,000-token growth, lemmatization and morphological tagging are cited as near-future enhancements to the corpus. A series of thoroughly structured monitoring diachronic corpora to be built from 2017 on is considered as a future basis for research into long-term trends in the history of Czech, thus complementing the quantity-oriented Diakorp.

  • Issue Year: 2014
  • Issue No: 4-5
  • Page Range: 208-215
  • Page Count: 8
  • Language: Czech