We can’t get by without the pragmatics corpus. Corpus pragmatics and the pragmatics corpus Cover Image

Ei saa me läbi “Pragmaatika” korpuseta. Korpuspragmaatika ja pragmaatikakorpus
We can’t get by without the pragmatics corpus. Corpus pragmatics and the pragmatics corpus

Author(s): Külli Prillop, Tiit Hennoste, Külli Habicht, Helle Metslang
Subject(s): Customs / Folklore, Applied Linguistics, Pragmatics, Finno-Ugrian studies, Cultural Anthropology / Ethnology, Culture and social structure , Philology
Published by: Eesti Kirjandusmuuseum
Keywords: discourse marker; Estonian language; language statistics corpus pragmatics; register; (inter)subjectivity; text type;

Summary/Abstract: Within the project “Pragmatics above grammar: Subjectivity and intersubjectivity in Estonian registers and text types” (PRG341) we are studying the expression of subjectivity and intersubjectivity in different written and spoken registers of modern Estonian. We focus on adverbs that function as discourse markers (e.g. vist ‘maybe, probably’, ilmselt ‘apparently, obviously’, tegelikult ‘actually’), markers that develop from main clauses containing cognition verbs that take sentence complements (e.g. (ma) arvan ‘I think’, usun ‘I believe’, (mulle) tundub ‘it seems (to me), it appears (that)’) as well as modal and performative verbs (e.g. võib (juhtuda) ‘can (happen)’, peaks (tulema) ‘should (come)’; kinnitan/väidan (olevat) ‘I affirm/claim’). The analysis combines quantitative corpus-linguistic and qualitative pragmatic approaches, thus belonging to the field of corpus pragmatics. Unlike previous studies of related topics, the project systematically compares the usage of markers in different registers (spoken, online communication, print texts) and text types. The pilot studies performed thus far have revealed several problems with the existing Estonian corpora, important in the study of pragmatics. Firstly, some text types are underrepresented or not represented at all, the text types cannot always be distinguished, and the particular text may not always correspond to the nominal text type (e.g. an academic text may contain quotes from texts of other types). All of this makes it difficult to do comparative statistical analysis of different text types. Secondly, the markers under examination are multifunctional and identifying their (inter)subjective function requires consideration of context broader than a single sentence. However, the public search systems for the existing corpora do not provide this context. For instance, the discourse marker function of cognition verbs is indicated primarily by the fact that the topic of the conversation or text follows through the subordinate clause, not the main clause. Since the available search systems do not provide context larger than a single sentence, the identification of the topic of the discourse, and therefore of the potential discourse-marker function of the verb, is made more difficult. To avoid these problems, the project working group is developing a new “Pragmatics” corpus, being created in the SketchEngine environment. The corpus is made up of 10 subcorpora representing different text types and registers. Each subcorpus contains roughly 500,000 words.

  • Issue Year: 2021
  • Issue No: 81
  • Page Range: 161-176
  • Page Count: 16
  • Language: Estonian