The Estonian Dependency Treebank and its Theoretical Basis Cover Image

Eesti keele sõltuvuspuude pank ja selle keeleteoreetilised lähted
The Estonian Dependency Treebank and its Theoretical Basis

Author(s): Kadri Muischnek, Kaili Müürisep
Subject(s): Syntax, Estonian Literature, Finno-Ugrian studies
Published by: Teaduste Akadeemia Kirjastus
Keywords: dependency syntax; treebank; automatic syntax analysis; Estonian language;

Summary/Abstract: This article presents the Estonian Dependency Treebank (EDT) and discusses its language-theoretical basis. EDT contains ca 400,000 tokens of fiction, newspaper and science texts. Its syntactic annotation is based on principles of dependency syntax. Previous experiments with annotating Estonian sentences according to the principles of phrase structure syntax have shown that the resulting trees tend to be too shallow and thus do not encode the linguistic information in the best possible way. Therefore dependency-syntactic representation was chosen instead. Dependency relations are efficient for encoding typical head-dependent relations like verb-argument or head-modifier but are not so suitable for analyzing adpositional phrases, verbal chains, multi-word expressions or other constructs without clear internal syntactic relations. In such cases, there are arguments both for and against all possible solutions.

  • Issue Year: 2016
  • Issue No: 62
  • Page Range: 122-145
  • Page Count: 24
  • Language: Estonian