Multi-Word Units in Czech Academic Texts Cover Image

Víceslovné jednotky typické pro české akademické texty
Multi-Word Units in Czech Academic Texts

Author(s): Dominika Kováříková, Oleg Kovářík, Lucie Lukešová
Subject(s): Theoretical Linguistics
Published by: Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství
Keywords: akademické texty; akademická slovní zásoba; víceslovné jednotky; korpusový výzkum

Summary/Abstract: This paper introduces Akalex, a new online tool created to support vocabulary research into Czech academic texts. The Akalex database includes close on 60 000 n-grams — candidates for typical academic words or multiword units — and it can be readily searched and filtered according to several criteria. The n-grams were extracted from the SYN2015 corpus of written contemporary Czech, based on their prominent frequency in academic texts and shared occurrence in many different academic disciplines, distinguishing them from general vocabulary on one hand and specialized terminology on the other. Each n-gram in the database is also furnished with additional information, such as part-of-speech, distribution by disciplines, frequency etc., making it possible to search for e.g. collocations with a specific lexeme (such as adjectives combined with the word výzkum ‘research’ or verbs with a certain preposition). The features of Akalex were put to the test in our case study covering 2-grams to 6-grams used in all 24 academic disciplines included in the SYN2015 corpus. Of nearly 900 candidates, 236 were manually chosen by two annotators as typical for academic texts. These were then further analysed and split into groups based on their semantic, functional and formal features. Among the most frequent were lexical bundles, collocations with content words and combinations of two verbs pointing to a frequent use of passives in academic texts etc.

  • Issue Year: 103/2021
  • Issue No: 2
  • Page Range: 228-243
  • Page Count: 16
  • Language: Czech