A New Corpus-Driven Lexical Database for Lithuanian as A Foreign Language Cover Image

Mokomasis lietuvių kalbos vartosenos leksikonas – nauja tekstyno pagrindu parengta leksinė bazė
A New Corpus-Driven Lexical Database for Lithuanian as A Foreign Language

Author(s): Jolanta Kovalevskaitė, Erika Rimkutė
Subject(s): Language studies, Lexis, Baltic Languages
Published by: Vytauto Didžiojo Universitetas
Keywords: Lexical database; Corpus Pattern Analysis; Corpus; Corpus linguistics; Learner lexicography; Lithuanian language;

Summary/Abstract: In this paper, we describe a new lexicographic resource for advanced learners of Lithuanian, the Lexical Database of “Lithuanian Language Usage”, which is the first attempt in Lithuanian lexicography to prepare a description of vocabulary based on the word usage analysis in the particular corpus. The written subpart of the Lithuanian Pedagogic Corpus (approx. 620,000 tokens) was used to develop headword lists and collect word usage information in the form of corpus patterns. In the database, there are 3,700 lexical items, words and multi-word units (compounds, idioms or sayings). For the appr. 700 most frequent words from a shared vocabulary (they appear in texts assigned to A1, A2, B1 and B2 levels, and their frequency in the whole corpus is 100 occurrences and above), we prepared a full-record entry: it includes sense-related corpus patterns with grammatical, semantic and lexical information and the examples illustrating all pattern components. The short-record entry (no patterns, only examples) is prepared for the less frequent words from the shared vocabulary, which are derivationally related to the most frequent headwords. The users are provided with 2,542 derivatives, which are linked to 940 headwords. In the database, 28,550 encoding examples are manually selected for all 3,000 headwords and 700 phrases. We discuss the features of the database, and, particularly, the adopted semi-automated procedure of Corpus Pattern Analysis, which was used for the description of word usage. We evaluate the approach applied,and discuss its advantages for users as well as provide the suggestions for the future improvements of the resource, which can be used as an additional resource in the classroom of Lithuanian as a foreign language, and, together with the available corpora, fill in a gap of usage information in the existing (learner) dictionaries.

  • Issue Year: 2022
  • Issue No: 20
  • Page Range: 154-193
  • Page Count: 40
  • Language: Lithuanian