On the efficiency of manual and semi-automatic detection of neologisms Cover Image

K efektivitě manuální a poloautomatické excerpce neologismů
On the efficiency of manual and semi-automatic detection of neologisms

Author(s): Jakub Sláma
Subject(s): Western Slavic Languages
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: data collection; manual detection of neologisms; neologisms; Python; semi-automatic detection of neologisms

Summary/Abstract: The paper presents a simple semi-automatic neologism detection procedure: a trivial Python script processes a text file, making use of a Czech morphological tagger, and extracts all words unrecognized by the tagger as potential neologisms. The list of these candidates has to be checked by a human (hence the label semi-automatic). This method was applied to a set of texts that were also analyzed in a more traditional way, by the “reading and marking” technique (i.e. the current practice). The comparison of the two methods has revealed that the semi-automatic procedure clearly outperforms the current practice both in speed and in efficiency.

  • Issue Year: 102/2019
  • Issue No: 1-2
  • Page Range: 64-75
  • Page Count: 12
  • Language: Czech