Effektiver Einsatz von NLP-Methoden am Beispiel des Codex Suprasliensis
Effective use of NLP methods using the example of the Codex Suprasliensis                
Author(s): Vladimir NeumannSubject(s): Language studies, Language and Literature Studies, Applied Linguistics, Computational linguistics, Philology, Translation Studies
Published by: Институт за литература - БАН
Keywords: Computational Philology; Natural Language Processing (NLP); Old Church Slavonic; Stanza and Corpus Annotation; DataFrame-Based Text Structuring
Summary/Abstract: The integration of computational methods in historical philology is becoming increasingly essential, yet challenges persist in harmonizing linguistic and technical aspects of text analysis. This study presents a comprehensive and methodologically transparent use case that documents the entire computational philological workflow– from data acquisition and modeling to analysis and visualization–in a structured and reproducible manner. Using the Codex Suprasliensis, one of the most significant Old Slavic manuscripts, as a case study, we demonstrate how modern Natural Language Processing (NLP) techniques, particularly the Stanza library for morphosyntactic annotation and DataFrame-based corpus structuring, can facilitate the exploration of historical textual corpora. A special emphasis is placed on benchmarking Stanza’s performance in processing Old Church Slavonic, evaluating its segmentation, tagging, and parsing accuracy against existing Gold Standard datasets. Additionally, we discuss the role of DataFrame-based modeling in ensuring an efficient and transparent structuring of linguistic data, allowing for flexible transformations and reproducible analyses. To support further research and methodological validation, all functional and extensively annotated scripts–including the complete NLP pipeline–are permanently provided via the GitHub platform of the Berlin State Library. The findings highlight the importance of structured corpus processing in computational philology and contribute to the ongoing refinement of NLP methodologies for historical languages.
Journal: Scripta & e-Scripta
- Issue Year: 2025
 - Issue No: 25
 - Page Range: 79-100
 - Page Count: 22
 - Language: German
 
- Content File-PDF
 
