Creating target hypotheses in a learner corpus of Latvian Cover Image

Mērķhipotēžu izvirzīšana latviešu valodas apguvēju korpusā
Creating target hypotheses in a learner corpus of Latvian

Author(s): Ilze Auziņa, Kristine Levane-Petrova, Inga Kaija
Subject(s): Morphology, Syntax, Language acquisition, Computational linguistics, Baltic Languages
Published by: Latvijas Universitātes Akadēmiskais apgāds
Keywords: corpus; learner corpus; target hypothesis; language acquisition; error annotation; corpus linguistics;

Summary/Abstract: A learner corpus is a computerized textual database of the language produced by foreign language learners. Such corpus enables researchers to create more efficient learning materials and teaching methodology for language learners by using the corpus-driven error analysis. The learner’s corpus, like other language corpora, can be annotated at different language levels (morphologically, syntactically); however, corpus-based error annotation and the corpus-based error analysis are especially important in the learner’s language research. Error analysis is influenced by certain factors: 1) the error types setup or error typology; and 2) target hypothesis setup, e. g., corrected text. Therefore, it is crucial to have special guidelines indicating the subject of annotation and the methods how the annotation is performed. The article begins with description of “The Latvian Learner corpus” (LaVA) and its initial development strategies, the term of target hypothesis and its role in the creation of the learner corpus. The main target hypothesis setup criteria in the LaVa corpus is also provided with the examples showing how the language learners’ utterances are being corrected according to the language norms, and the main deviations from the rules allowed. This work has received financial support from the Latvian Council of Science under the grant agreement No. lzp-2018/1-0527 (“Development of Learner Corpus of Latvian: methods, tools and applications”) in synergy with the Latvian State Research Programme “Latvian Language”, agreement No. VPP-IZM-2018/2-0002 (subproject “Acquisition of Latvian Language”).

  • Issue Year: 2020
  • Issue No: 11
  • Page Range: 7-26
  • Page Count: 20
  • Language: Latvian