Suitability of automatically selected example sentences for learners’ dictionaries as tested on lexicographers and language learners Cover Image

Leksikograafide ja keeleõppijate hinnangud automaatselt tuvastatud korpuslausete sobivusele õppesõnastiku näitelauseks
Suitability of automatically selected example sentences for learners’ dictionaries as tested on lexicographers and language learners

Author(s): Kristina Koppel
Subject(s): Language studies, Education, Lexis, Language acquisition
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: corpus lexicography; learners’ lexicography; example sentences; GDEX; Estonian;

Summary/Abstract: This paper reports on an assessment task carried out among students of Tallinn University and the University of Tartu, who speak Estonian at B2-C1 proficiency level, and among lexicographers working at the Institute of the Estonian Language. Te purpose of the task was to determine whether, according to the above two types of annotators, authentic and unedited corpus sentences would be suitable as example sentences for learners’ dictionaries on B2-C1 level. Te results of the assessment task were also to help evaluate the output of version 1.4 of the Estonian module of GDEX (GDEX 1.4) used to choose and display web sentences in the Institute’s new language portal Sõnaveeb. GDEX (Good Dictionary Example) is a function of the corpus query system Sketch Engine, designed to find optimal example sentence candidates from large corpora. Te results of the assessment task confirmed three hypotheses: 1) Before displaying authentic corpus sentences to end-users, a filtering of corpus sentences is necessary; 2) GDEX 1.4 can identify good example candidates from corpora and filter out inappropriate candidates; 3) example sentences compiled by lexicographers are suitable example sentences. Both types of annotators considered as many as 96% of the dictionary examples to be suitable example sentences and 85% of corpus sentences chosen as good examples by GDEX 1.4. Only 6% of the sentences that were discarded by GDEX 1.4 were considered as suitable, meaning that 94% of the bad candidates had been filtered out successfully. As for unfiltered corpus sentences, 60% of those were considered unsuitable. When asking for the annotators’ reasons for considering a sentence unsuitable, the most common arguments were that the sentences include anaphora and hence need more context, or that the sentences are colloquial, too long or too short.

  • Issue Year: 2019
  • Issue No: 29
  • Page Range: 84-112
  • Page Count: 29
  • Language: Estonian