HOW TO RECOGNIZE ADJECTIVES? AN ANALYSIS OF CORPUS PATTERNS Cover Image

KUIDAS ÄRA TUNDA ADJEKTIIVI? KORPUSKÄITUMISE MUSTRITE ANALÜÜS
HOW TO RECOGNIZE ADJECTIVES? AN ANALYSIS OF CORPUS PATTERNS

Author(s): Maria Tuulik, Ene Vainik, Geda Paulsen, Ahti Lohk
Subject(s): Morphology, Syntax, Lexis, Cognitive linguistics, Finno-Ugrian studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: parts of speech; morphosyntax; lexicography; language technology; Estonian;

Summary/Abstract: This study was inspired by a survey of Estonian lexicographers (Paulsen, Vainik and Tuulik 2019), where the lexicographers expressed the need for a new digital tool that would facilitate word class identification for ambiguous cases. In the case of adjectives, the lexicographers emphasized the difficulty of determining if a verb participle has sufficient adjectival use to be included in dictionaries as an adjective. In the article, we examine the morphosyntactic features characteristic of the adjective class and test different parameters in the corpora to differentiate adjectives from other word classes. We provide an overview of the test results of six parameters. In the study we analysed 12 groups of 10 words each. The test groups and test words were chosen manually, with consideration given to the problematic cases outlined by the lexicographers. We compared different types of adjectives or near to adjectives (the test groups) as well as different word classes (the control groups). To analyse the parameters’ capability to set adjectives apart, a deviation study was conducted. We determined a normative range for prototypical adjectives and set the minimum and maximum value for every parameter. In addition, we calculated the deviation of other test groups from the prototypical adjective range. The groups of particular focus (regular verb participles vs. adjectives) were best differentiated by three parameters. The sentence beginning testword+noun parameter (which determined if and how often a test word starts a sentence in the complement position) sets participles apart with 90% accuracy. Also, the parameter that measured the existence of comparative forms for test words was 100% accurate. The adverb parameter (which measured how often a test word is preceded by an adverb) was able to distinguish adjectives from verb participles with 80% accuracy. Among all groups, the comparative form parameter was the most accurate in the deviation study at setting prototypical adjectives apart from other test groups. A Euclidean distance analysis was able to differentiate adjective-like test words and test groups from others that do not behave similarly to prototypical adjectives. As all tested parameters produced meaningful results and were able to differentiate some word classes from adjectives, they can be input for a new digital tool which would show a word’s deviation from prototypical word class representatives to help lexicographers with word-class-related decisions.

  • Issue Year: 2022
  • Issue No: 18
  • Page Range: 279-302
  • Page Count: 25
  • Language: Estonian