Phrasemes and Collocations in the Corpus – How to Find Unknown Variants Cover Image

Phrasemes and Collocations in the Corpus – How to Find Unknown Variants
Phrasemes and Collocations in the Corpus – How to Find Unknown Variants

Author(s): Hana Skoumalova, Přemysl Vítovec, Milena Hnátková
Subject(s): Language studies, Theoretical Linguistics, Applied Linguistics, Lexis, Computational linguistics, Western Slavic Languages, ICT Information and Communications Technologies, Phraseology
Published by: SAV - Slovenská akadémia vied - Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: multiword expressions; corpus annotation; syntactic patterns; lexicon transformations; Czech language;

Summary/Abstract: This paper addresses the identification and annotation of multiword expressions (MWEs) in Czech corpora, focusing on enhancing the search procedure through transformations of existing lexicon entries and the addition of new entries based on syntactic patterns. We discuss the limitations of current annotation systems and introduce a new, efficient annotation system that leverages a comprehensive MWE dictionary. Our methodology includes the use of syntactic patterns to identify new collocations, automatic transformations of known MWEs, and manual searches for creatively varied expressions. The results demonstrate significant improvements in the success rate of corpus annotation, with newly identified collocations and transformed MWEs contributing to a richer and more accurate linguistic resource.

  • Issue Year: 76/2025
  • Issue No: 1
  • Page Range: 212-222
  • Page Count: 11
  • Language: English
Toggle Accessibility Mode