Multiword Lexical Entities in the Corpus and Lexicon Cover Image

Víceslovné lexikální entity v korpusu i lexikonu
Multiword Lexical Entities in the Corpus and Lexicon

Author(s): Marie Kopřivová, Hana Skoumalova, Hana Goláňová, Milena Hnátková, Anna Christou, Tomáš Jelínek, Jan Křivan, Vladimír Petkevič, Karla Tvrdá, Přemysl Vítovec, Pavel Vondřička
Subject(s): Electronic information storage and retrieval, Education and training, Applied Linguistics, Computational linguistics, Descriptive linguistics, Distance learning / e-learning
Published by: Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství
Keywords: multiword units; MWU database LEMUR,;Academic Dictionary of Contemporary Czech; Czech corpora; phrasemes

Summary/Abstract: The aim of the Multiword Units (MWUs) for Digital Education Project is to extend the LEMUR lexicographic database and its software application for corpus annotation, linking the database with language corpora and with the Academic Dictionary of Contemporary Czech. In addition to a signif icant expansion of the number of MWUs in the database, the focus is on creating a new annotation program that will be able to capture fragments of MWUs and their combinations in Czech texts. As new units are added to the database, modifications and corrections of existing entries are also being made, both in the classification of individual collocations (e.g., proverbs are being revised) and in the database structure (new categories are being added). The first version of the program that will search the corpus and tag MWUs has already been created, replacing the existing FRANTA annotation program. The new description, taking into account several orthogonal axes, will assign a tag to each MWU and its components, providing the user with richer information. In the test run, it will be possible to toggle from corpus to database and from dictionary to database. A didactic manual is also planned, to teach students how to retrieve information about MWUs and work with lexico graphic descriptions.

  • Issue Year: 107/2025
  • Issue No: 2
  • Page Range: 177-188
  • Page Count: 12
  • Language: Czech
Toggle Accessibility Mode