Automatic Detection of the Bulgarian Evidential Renarrative Cover Image

Automatic Detection of the Bulgarian Evidential Renarrative
Automatic Detection of the Bulgarian Evidential Renarrative

Author(s): Irina Temnikova, Ruslana Margova, Stefan Minkov, Tsvetelina Stefanova, Nevena Grigorova, Silvia Gargova, Venelin Kovatchev
Subject(s): Language and Literature Studies, Economy, Theoretical Linguistics, Applied Linguistics, Computational linguistics, ICT Information and Communications Technologies
Published by: Институт за български език „Проф. Любомир Андрейчин“, Българска академия на науките
Keywords: evidentiality; Bulgarian; renarrative; fine-tuned BERT classifier; Python; annotation.

Summary/Abstract: Manual and automatic verification of the trustworthiness of information is an important task. Knowing whether the author of a statement was an eyewitness to the reported event(s) is a useful clue. In linguistics, such information is expressed through “evidentiality”. Evidentials are especially important in Bulgarian, as Bulgarian journalists often use a specific type of evidential (“renarrative”) to report events that they did not directly observe, nor verify. Unfortunately, there are no automatic tools to detect Bulgarian renarrative. This article presents the first two automatic solutions for this task. Specifically - a fine-tuned BERT classifier (renarrative BERT detector, BGRenBERT), achieving 0.98 Accuracy on the test split, and a renarrative rulebased detector (BGRenRules), created with regular expressions, matching a parser’s output. Both solutions detect Bulgarian texts containing the most frequently encountered forms of renarrative. Additionally, we compare the results of the two detectors with the manual annotation of subsets of two Bulgarian fake text datasets. BGRenRules obtains substantially higher results than BGRenBERT. The error analysis shows that the errors from BGRenRules most frequently correspond to cases in which humans also have doubts. The training dataset (BgRenData), the annotated dataset subsets, and the two detectors are made publicly accessible on Zenodo, GitHub, and HuggingFace. We expect that these new resources will be of invaluable assistance to 1) Bulgarian-language researchers, 2) researchers of other languages with similar phenomena, especially those working on verifying information.

  • Issue Year: 1/2025
  • Issue No: 1
  • Page Range: 61-83
  • Page Count: 23
  • Language: English
Toggle Accessibility Mode