Tailored Fine-Tuning for Comma Insertion in Czech

Tailored Fine-Tuning for Comma Insertion in Czech
Tailored Fine-Tuning for Comma Insertion in Czech

Author(s): Jakub Machura, Hana Žižková, Patrik Stano, Tereza Vrabcová, Dana Hlaváčková, Ondřej Trnovec
Subject(s): Language studies, Theoretical Linguistics, Applied Linguistics, Syntax, Computational linguistics, Western Slavic Languages, ICT Information and Communications Technologies
Published by: SAV - Slovenská akadémia vied - Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: comma; Czech; Fine-tuning; Large Language Model (LLM)

Summary/Abstract: Transfer learning techniques, particularly the use of pre-trained Transformers, can be trained on vast amounts of text in a particular language and can be tailored to specific grammar correction tasks, such as automatic punctuation correction. The Czech pre-trained RoBERTa model demonstrates outstanding performance in this task (Machura et al. 2022); however, previous attempts to improve the model have so far led to a slight degradation (Machura et al. 2023). In this paper, we present a more targeted fine-tuning of this model, addressing linguistic phenomena that the base model overlooked. Additionally, we provide a comparison with other models trained on a more diverse dataset beyond just web texts.

Details
Contents

Journal: Jazykovedný časopis

Issue Year: 76/2025
Issue No: 1
Page Range: 268-278
Page Count: 11
Language: English

Content File-PDF

Back to list