Analysis of Syntactic Complexity in the Annotated Lithuanian Language Corpus by the Method of Dependency Distance Cover Image

Sintaksinio sudėtingumo analizė anotuotame lietuvių kalbos tekstyne priklausomybių nuotolio metodu
Analysis of Syntactic Complexity in the Annotated Lithuanian Language Corpus by the Method of Dependency Distance

Author(s): Vytautas Ožeraitis
Subject(s): Syntax, Language acquisition, Baltic Languages
Published by: Kauno Technologijos Universitetas
Keywords: syntactic complexity; textual linguistics; distance of syntactic dependence; Lithuanian language; textual annotation;

Summary/Abstract: Syntactic complexity is a feature common to all languages and is generally described as an assessment of the sophistication, elaborateness, length, and patterns of a sentence (or text) and its elements. In Lithuanian, syntactic complexity is not widely analyzed. Studies of syntactic complexity are problematic due to the unstable definition of the term and the abundance of different methods for calculating it. This article presents the study of syntactic complexity in the syntactically annotated Lithuanian corpus ALKSNIS, using the syntactic dependency distance method, which is based on the Dependency Locality theory. The article introduces the concept of syntactic complexity, presents the principles of its research, their relevance, and discusses the results of syntactic complexity in the corpus, advantages, and disadvantages of the chosen method. This study aims to supplement the field of the syntactic complexity analysis of the Lithuanian language. For the analysis of syntactic complexity, two measures are used: the mean dependency distance and the modified mean dependency distance. The study analyzes corpus data, determines the syntactic complexity of individual sentences and texts. A detailed analysis reveals both the shortcomings of the methods used and their dependence on an accurate and consistent annotation scheme. Analyzing the data, the need to include linkages between sentences into syntactic complexity formulas becomes apparent. The position of the sentence vertex included in the modified mean dependence distance formula has been found to potentially distort the results, hence the study calls for further refinement of the formula. The boundaries of the complexity of sentences and texts identified in the present study are indicative, hence further qualitative analysis and experiments are needed to define them with greater precision.

  • Issue Year: 2021
  • Issue No: 39
  • Page Range: 93-110
  • Page Count: 18
  • Language: Lithuanian