Statistical methods for phrasal verb detection in Estonian dialects Cover Image

Statistilised meetodid murdekorpuse ühendverbide tuvastamisel
Statistical methods for phrasal verb detection in Estonian dialects

Author(s): Kristel Uiboaed
Subject(s): Language and Literature Studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: computational linguistics; corpus linguistics; dialectology; methods and tools; statistics; Estonian

Summary/Abstract: The aim of this study was to assess different statistical methods of automatic collocations extraction from the corpus. To extract the collocations, association measures (AM) were applied and the association scores (AS) for the collocation candidates found in the corpus were calculated. An AS indicates the collocational strength between two words. An advantage of the AMs is the fact that in addition to the co-occurrence frequency, the marginal frequencies of collocating words are also taken into account. To calculate the AS, the following data is needed: co-occurrence frequency, marginal frequencies of collocating words, expected frequency and the sample size.

  • Issue Year: 2010
  • Issue No: 6
  • Page Range: 307-326
  • Page Count: 19
  • Language: Estonian