Automatic Recognition of Arbitrary Adjective-Noun Collocations Cover Image

Automatizuotas arbitraliųjų kolokacijų atpažinimas: būdvardžių ir daiktavardžių kolokacijos
Automatic Recognition of Arbitrary Adjective-Noun Collocations

Author(s): Jolanta Kovalevskaitė, Erika Rimkutė, Jurgita Vaičenonienė
Subject(s): Language acquisition, Cognitive linguistics, Computational linguistics
Published by: Kauno Technologijos Universitetas
Keywords: arbitrary collocations; DELFI.lt text; computational linguistics; vector method; limited lexical conjunctions; adjectives; nouns;

Summary/Abstract: This article focuses on arbitrary collocations – a particular type of collocations which are characterized by unmotivated relations between the constituents (differently from trivial or motivated collocations as, for example, a beautiful day, new research). Typically, arbitrary collocations have a certain degree of lexical restrictedness, i.e., although there may be several close synonyms, a particular one is preferred in a certain word combination, for example, broad/wide outlook vs. big outlook; strong health vs. powerful health. As a result of the analysis of 5000 adjective-noun collocations retrieved from the “Database of Lithuanian Multiword Expressions”, approximately 650 arbitrary collocations were identified using the synonym substitution test: if the adjectival component of the collocation (adjective or participle) could not be replaced by a close synonym, the collocation was considered arbitrary. The methods of computational linguistics, or Word Embedding Approach in particular, were used to automatically retrieve close synonyms of adjectives in adjective-noun collocations. Nouns and participles were automatically grouped into approximately 800 vector strings. The article describes in detail the steps in data processing and analysis as well as arbitrary collocation identification criteria and methods by using the Global Vector (GloVe) model.

  • Issue Year: 2021
  • Issue No: 39
  • Page Range: 71-84
  • Page Count: 14
  • Language: Lithuanian