CapekDraCor: A New Contribution to the European Programmable Drama Corpora Cover Image

CapekDraCor: A New Contribution to the European Programmable Drama Corpora
CapekDraCor: A New Contribution to the European Programmable Drama Corpora

Author(s): Petr Pořízka
Subject(s): Language and Literature Studies, Applied Linguistics, Studies of Literature, Czech Literature
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: data annotation; computational literary studies; corpus building; drama; DraCor; network analysis; quantitative analysis

Summary/Abstract: The aim of this paper is to present the new CapekDraCor corpus and the DraCor project with its research-oriented concept of a programmable corpora focused on quantitative analyses within the framework of computational literary studies. This digital platform extends the possibilities of large-scale drama analysis with a focus on the dramatic character(s). The basic operationalisation is the interaction within a dramatic configuration, i.e., the scenic co-presence of two speakers, from which network data are automatically extracted, both global networks of interactions of dramas and data characterising individual actors, i.e., literary characters. The paper demonstrates the CapekDraCor corpus, a new contribution to the extensive DraCor database, and presents the way the data are processed with respect to their specific multi-layered structure. The corpus contains all the plays written by Karel and Josef Čapek and the data are processed in a standardized format based on XML and general TEI guidelines for processing drama with a defined basic drama tagset. CapekDraCor also uses the newly created EZdrama format for data processing, which works as an intermediate step from .txt to .xml file as a lightweight YAML-like markup language. A file in this format can be automatically converted into a DraCor-ready XML file with a TEI header. The advantage of the programmable corpora concept is the possibility to use suitably structured data for drama research outside the DraCor platform and with other methods or tools for textual analysis. Simultaneously, this approach moves the researcher from the technical requirements of the analysis to operationalised computational analysis based on research questions and pre-prepared and flexible tools. DraCor is a unique open infrastructure (both in terms of data and tools) for the analysis of European drama, currently comprising 15 corpora in 10 different languages with a total of about 3,000 plays from a wide range of periods.

  • Issue Year: 74/2023
  • Issue No: 1
  • Page Range: 244-253
  • Page Count: 10
  • Language: English