Self-compiled Corpora in Linguistic Research Cover Image

O roli samodzielnie przygotowanych korpusów w badaniach językoznawczych
Self-compiled Corpora in Linguistic Research

(On the Example of an Internet Corpus)

Author(s): Marcin Zabawa
Subject(s): Language and Literature Studies, Theoretical Linguistics, Applied Linguistics, Morphology, Syntax, Lexis, Descriptive linguistics, Philology
Published by: Krakowskie Towarzystwo Popularyzowania Wiedzy o Komunikacji Językowej Tertium
Keywords: corpus; corpus linguistics; Internet language; press language;

Summary/Abstract: The aim of the present paper, which is of a theoretical character, is to discuss the problems related to the process of the compilation of one’s own linguistic corpus. A linguist who wants to study e.g. neologisms must base his or her analysis on a certain source. Formerly, the language of the press was frequently used as such source; now, however, linguistic corpora and the Internet are utilized more frequently. The author of the paper points out that both the National Corpus of Polish (NKJP) and the Internet as a whole are not the best choices (and are definitely not sufficient) when a linguist intends to study e.g. the newest vocabulary items in Polish. The use of the spoken language as the main source is even more problematic. The best solution, albeit the most difficult and time-consuming at the same time, is the compilation of one’s own linguistic corpus. The paper discusses the inadequacy of regarding the press or the Internet as a whole as the best sources and then proceeds to discuss various theoretical aspects connected with the compilation of one’s own corpus (such as the choice of the type of texts, corpus size, the use of computer tools intended to aid in corpus compilation, etc.).

  • Issue Year: 4/2019
  • Issue No: 1
  • Page Range: 211-232
  • Page Count: 22
  • Language: Polish