The Estonian native-speaking students’ text corpus EMMA Cover Image

Eesti keele kui emakeele õppija tekstikorpus EMMA
The Estonian native-speaking students’ text corpus EMMA

Author(s): Kadri Sõrmus, Kersti Lepajõe
Subject(s): Language and Literature Studies, Language acquisition, Baltic Languages
Published by: Tallinna Ülikooli Kirjastus
Keywords: language coprora; students’ language use; language learning;

Summary/Abstract: EMMA, the Estonian language learners’ text corpus being developed at the Institute of Estonian and General Linguistics of Tartu University, is an environment that gathers texts connected with study processes of students learning Estonian as a native language. The article gives an overview of the basis of compiling the EMMA corpus, its character, annotation, analysis and research opportunities.The corpus texts fall into four categories: examination and level test papers, student research papers, essays sent to writing contests, and other texts. The student text corpus will include texts from two school levels: high school (grades 11 and 12) and middle school (grades 8 and 9). During the first phase of corpus creation in 2013–2016, the focus of the Estonian native-speaking students’ text corpus EMMA is on examination papers and level tests. Graduation essays of high school students have been collected in Estonia since 1997, when the compulsory Estonian language exam given at the end of high school started to be graded nationally. During the period 1997–2014, all high school graduates (approx. 7,000–10,00012th graders per year) wrote 400–600 word argumentative essays as a national examination. In order to build the corpus, samples from 1999, 2002,2005, 2008, 2011 and 2014 were selected, and texts were scanned, typed in,entered into the EMMA environment, and the first annotation was added,i.e. mistakes marked by the nationally selected graders.By 2016, it is planned to enter at least 6,000 texts, including 3,000national examination essays (approx. 600 words per essay) and 3,000level tests (approx. 200 words), as well as making the corpus accessible to researchers through the EMMA environment.So far, there are no electronic text corpora for analysing the language use of Estonian native-speaking students that enable quick searches and use of contemporary research methods. Therefore, creating a corpus of Estonian native-language learners’ texts is an important step in providing researchers of students’ papers and other researchers with trustworthy primary material and in creating more analysis opportunities. Hopefully,the language learners’ text corpus EMMA will fulfill its goals, contribute to researching texts written by Estonian native-language students and,through research outcomes, contribute to the quality of native language teaching and teaching materials.

  • Issue Year: 2014
  • Issue No: 16
  • Page Range: 205-227
  • Page Count: 23
  • Language: Estonian