The use of the corpus for making of the headword list of a Czech monolingual dictionary Cover Image

Využití korpusu při tvorbě hesláře výkladového slovníku češtiny
The use of the corpus for making of the headword list of a Czech monolingual dictionary

Author(s): Vít Michalec, Jana Nová
Subject(s): Language and Literature Studies, Applied Linguistics, Descriptive linguistics
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: general monolingual dictionary; lexicography; headword list, corpus; average reduced frequency; raw frequency; foreign words; loanwords

Summary/Abstract: The headword list of the Academic Dictionary of Contemporary Czech (ASSC) is mainly based on language corpora. We describe basic principles of the headword­list­making process: setting proportions of alphabetical sections, semi­automatical generation of a word list from a set of corpora, manual cleaning of the word list in order to detect lexicographically suitable and unsuitable items. Proportions and examples of the unsuitable items are given, e. g., foreign words, lemmatisation mistakes, abbreviations, word fragments, typing errors; alphabetical sections A–AM (mostly loanwords) and Č (mostly words of Čzech origin) are compared. We also briefly discuss the possibility of using the average reduced frequency instead of the raw frequency for the generation of the word list, and possible using of newer and larger language corpora, too.

  • Issue Year: 2020
  • Issue No: 21
  • Page Range: 3-16
  • Page Count: 14
  • Language: Czech