Use of the Language Corpora in Automatic Generation of Latvian Language Exercises Cover Image

Valodas korpusu izmantošana latviešu valodas uzdevumu automātiskā ģenerēšanā
Use of the Language Corpora in Automatic Generation of Latvian Language Exercises

Author(s): Ilze Auziņa, Roberts Darģis, Inga Kaija, Kristine Levane-Petrova, Kristīne Pokratniece
Subject(s): Language acquisition, Computational linguistics, Baltic Languages
Published by: Latvijas Universitātes Literatūras, folkloras un mākslas institūts
Keywords: computational linguistics; language corpora; Latvian language acquisition; sentence selection; exercises;

Summary/Abstract: Today, language corpora are not only the empirical basis of research but can also be used in developing a variety of data-driven teaching materials and tools. The experience of other countries shows that the development of self-assessment exercises for language learning can be partially or fully automated using language corpora and natural language processing (NLP) tools, thus providing both a variety of exercises and support for teachers in the implementation of the curriculum. The Latvian Language Learners Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia, includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester reaching A1 (possibly A2) Latvian language proficiency level. The size of the corpus is more than 180 000 words. According to the LaVA data analysis, including learners error analysis, exercises and tests are generated. Data analysis allows us to identify problematic spelling, grammar, and vocabulary issues. The exercises are intended to help the language learner to strengthen the linguistic competence of Latvian language, for example, the use of verb forms in the indicative mood, both in indefinite and perfect tense forms. The article discusses the methodology according to which, based on the statistical and quantitative analysis of the LaVA corpus data, sample sentences are selected from different corpora of Latvian language, for example, The Balanced Corpus of Modern Latvian (LVK2018), The Corpus of Students’ Essays (SPK), as well describes the task-development algorithms and development of online self-assessment exercises site.

  • Issue Year: 2022
  • Issue No: 47
  • Page Range: 264-282
  • Page Count: 19
  • Language: Latvian