Corpus-based research of spoken language: the state-of-the-art for Czech and English Cover Image

Korpusový výzkum mluveného jazyka na příkladu češtiny a angličtiny: současný stav
Corpus-based research of spoken language: the state-of-the-art for Czech and English

Author(s): Anna Čermáková, Marie Kopřivová
Subject(s): Computational linguistics
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: spoken language research; corpus linguistics; spoken Czech; basic descriptive unit for spoken language

Summary/Abstract: The article aims to review corpus-based research on spoken language, emphasizing issues in description and conceptualization of the grammar of spoken language in relation to the grammar of written language. The review first briefly looks at the development of spoken corpora, from simply transcribed corpora without sound alignment to today’s sophisticated multi-modal corpora. The main part of the article deals with issues concerning the metalanguage for the description of spoken language, the choice of its basic descriptive unit, the status of basic linguistic categories such as part-of-speech, and typical lexical and grammatical devices. The existing extensive research on spoken English is reviewed and in line with it, illustrative examples based on Czech spoken corpora are provided. These are further contrasted with examples from written data to enhance the inherent differences between spoken and written language and the need to adjust the metalanguage of the description.

  • Issue Year: 79/2018
  • Issue No: 3
  • Page Range: 217-240
  • Page Count: 24
  • Language: Czech