Written Estonian at the levels A2–C1: Comparative automated analysis Cover Image

Eesti keele A2–C1-taseme kirjalike tekstide võrdlev automaatanalüüs
Written Estonian at the levels A2–C1: Comparative automated analysis

Author(s): Kais Allkivi-Metsoja
Subject(s): Morphology, Lexis, Finno-Ugrian studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: natural language processing; CEFR levels; lexical complexity; morphological analysis; written learner language; Estonian;

Summary/Abstract: To achieve the communicative purposes of the language proficiency levels defined in the Common European Framework of Reference for Languages (CEFR), a learner needs to acquire lexical and grammatical tools specific to the target language (L2). Yet there is little empirical evidence on language-specific features that mark the development from one level to another. This study aims to determine which linguistic features distinguish A2–C1-level written use of Estonian as L2 and how, i.e., which levels differ significantly and what is the direction of change. Related research has either focused on observing individual linguistic phenomena at different proficiency levels or automatically predicting the level of writings, while not describing the level-to-level dynamics of analysed features. Hereby, an attempt is made to bridge this gap. Relying on language processing and statistical analysis of the extracted data, two types of features are compared in Estonian proficiency examination writings of distinct levels: 1) lexical features – measures related to various aspects of lexical complexity; 2) morphological features – frequencies of parts of speech (PoS) and grammatical categories of nominals and verbs. The analysed corpus includes 480 creative writings, each level represented by 120 texts randomly sampled from various examinations. Welch’s ANOVA with Bonferroni correction is used to test for significant differences between the proficiency levels, and between the examinations of the same level to detect task-induced variance. For pairwise comparisons of proficiency levels, the Games-Howell post-hoc test is used. Correlation analysis and multidimensional scaling are applied to explore the co-occurrence of the linguistic features in learner texts.

  • Issue Year: 2021
  • Issue No: 31
  • Page Range: 13-59
  • Page Count: 47
  • Language: Estonian