The automatic part-of-speech disambiguation of the word to in fixed collocations Cover Image

Automatická slovnědruhová desambiguace slova to v ustálených větných výrazech
The automatic part-of-speech disambiguation of the word to in fixed collocations

Author(s): Milena Hnátková
Subject(s): Language studies, Language and Literature Studies, Applied Linguistics
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: corpus; automatic morphological disambiguation; automatic identification of collocations; sentential phrases; word form to

Summary/Abstract: This paper deals with an automatic part-of-speech disambiguation of Czech texts containing the word to (E. it) in fixed collocations used especially in spoken Czech, and, moreover, with case identification of the pronominal reading of this word. The word to is ambiguous: the result of automatic morphological analysis of this word is either the pronominal lemma ten (it) as a nominative/accusative singular neuter, or the particle lemma to.It is very difficult to automatically distinguish the nonprepositional nominative and accusative case in Czech texts. Therefore, the paper primarily focuses on to as a particle.The software module performing automatic identification of collocations in Czech corpus texts is part of the automatic morphological rule-based disambiguation used for tagging texts of synchronic Czech in the corpora of the SYN series: it deals mainly with the disambiguation of nongrammatical collocations and phrases. The paper focuses on fixed expressions listed in the Dictionary of Czech Phraseology and Idiomatics and is based on the description of automatic identification and classification of collocations comprising the word to in the SYN2010 corpus. Also, examples (primarily idioms) are presented where automatic disambiguation using general grammatical rules yields unreliable results.

  • Issue Year: 2013
  • Issue No: 7
  • Page Range: 22-35
  • Page Count: 14
  • Language: Czech