On the classification of morphological variants Cover Image

Ke klasifikaci morfologických variant
On the classification of morphological variants

Author(s): Václav Cvrček, Vilém Kodýtek
Subject(s): Language and Literature Studies
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: corpus analysis; effect size; language production heterogeneity; morphological variation; Shannon entropy; statistical analysis

Summary/Abstract: After briefly discussing the heterogeneities inherent to language production and how they influence corpus evidence, we describe a scale for the classification of individual morphological variants by their relative frequencies that has recently been independently proposed in <i>Mluvnice současné češtiny</i> (2010) (A Grammar of Contemporary Czech, hereafter <i>GCCz</i>), of which we are co-authors, and in Bermel & Knittl (2012). Those variants with relative frequency (roughly) within 1% and 10% are classified by the respective authors as “sparse” and “marked”, and those occurring in (roughly) less than 1% cases as “unexpected” and “isolated”. Another feature of the scale is the “equipollence” of variants of a doublet having relative frequencies within (roughly) 1/3 and 2/3 (for this criterion see also Štícha 2009). The scale in <i>GCCz</i> is heuristically based on Shannon entropy and valid for synchronic functionally equivalent variants. Recently, R. Čech (2012) has claimed to have revealed “a serious statistical deficiency” in <i>GCCz</i>. We show that this is a misunderstanding stemming from his not distinguishing between the null-hypothesis statistical significance testing and the effect size evaluation. We end with a brief note on the structureof the resources employed in <i>GCCz</i>.

  • Issue Year: 74/2013
  • Issue No: 2
  • Page Range: 139-145
  • Page Count: 7
  • Language: Czech