A quantitative analysis of spelling variation in the Corpus of Early English Correspondence (CEEC). Cover Image

Kvantitativní analýza ortografické variability v Korpusu rané anglické korespondence
A quantitative analysis of spelling variation in the Corpus of Early English Correspondence (CEEC).

Author(s): Ondřej Tichý
Subject(s): Language and Literature Studies
Published by: Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství
Keywords: correspondence; corpus; entropy; variation; spelling;

Summary/Abstract: The paper explores trends in spelling variation as reflected in Early English correspondence (15th–17th c.) on the material of the Corpus of Early English Correspondence (CEEC). Overall change in spelling variation has so far been commented on only in relatively general terms and never on quantitative grounds. There is, of course, no doubt about the general direction of the change (towards greater standardization, though not in a straightforward manner) and its basic characteristics, such as its slower pace in private documents compared to the spelling of professional publications, but the data to support the assertions as well as precise definitions of spelling variation or regularisation have not yet been, to our knowledge, provided. This paper introduces a novel methodology for the quantification of spelling variation and regularity, which allows a more objective assessment of its change and which also makes use of the metadata provided by the CEEC: such as gender, letter authenticity or relationship/kinship between the author and the recipient. The paper explores interactions of such variables from the diachronic perspective using quantified levels of spelling regularity. The measure introduced for this purpose is based on weighted information (Shannon) entropy, as a measure of predictability of spellings of individual functionally defined types, and its calculation is partly based on the morphological tagging of the parsed version of the Corpus.

  • Issue Year: 2018
  • Issue No: Special
  • Page Range: 27-41
  • Page Count: 15
  • Language: Czech