Development of Serbian Dictionary for Automatic Text Analysis (LIWCser) Cover Image

Razvoj srpske verzije rečnika za automatsku analizu teksta (LIWCser)
Development of Serbian Dictionary for Automatic Text Analysis (LIWCser)

Author(s): Jovana Bjekić, Ljiljana B. Lazarević, Milica Erić, Elena Stojimirović, Teodora Đokić
Subject(s): Psychology, Social Informatics, ICT Information and Communications Technologies
Published by: Филозофски факултет, Универзитет у Београду
Keywords: automatic text analysis; Serbian dictionary LIWCser for automatic text analysis; verbal behaviour; implicit and explicit measures;

Summary/Abstract: Automatic text analysis is a methodological approach in the analysis of individual differences in verbal behaviour. It enables extraction of statistically manipulable information about intensity and/or frequency of thematic and stylistic characteristics of verbal output. LIWC (Linguistic Inquiry and Word Count), one of the widely used software solutions for automatic text analysis, performs analyses by matching word stems from incorporated software dictionary with those from text input. It provides information about the percentage of each of the predefined categories in the analyzed text. Research suggests that data obtained by automatic text analysis have potential in explaining the relationship between implicit and explicit measures, independently of the object of measurement (attitudes, pathological potential, assessment of basic personality traits etc.). The topic of this paper is the development of the Serbian LIWC dictionary. Development of the dictionary was performed in four phases: translation of English LIWC dictionary, forming lemmas, classification of word stems according to absolute consensus among four independent raters (where word stems could be categorized in more than one category, depending on the context), and revision of the content of categories and creation of final set of word stems. The final version of the LIWCser dictionary contains 12103 word stems classified into 65 categories (linguistic, psychological and personal concerns). Only four word stems (0.03%) were classified into eight categories, 22 (0.2%) into seven, 147 (1.2%) into six, and 568 (4.7%) into five. 1531 (12.6%) word stems were classified into four categories, 2913 (24.1%) into three, 4800 (39.7%) into two, while 2188 (17.5%) word stems were classified into only one category. Development of the LIWCser dictionary allows researchers to collect and analyze data on verbal behaviour and to study the relationship between implicit and explicit measures in different fields of psychology.

  • Issue Year: 15/2012
  • Issue No: 1
  • Page Range: 85-110
  • Page Count: 27
  • Language: Serbian