Re-imaging Versiones Slavicae
The paper discusses a strategy for transforming the Versiones Slavicae database into an XML format, which would improve opportunities for application-independent preservation and maintenance.
More...We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
The paper discusses a strategy for transforming the Versiones Slavicae database into an XML format, which would improve opportunities for application-independent preservation and maintenance.
More...
The article contains some results of analyses of the Vienna part of the Codex Marianus (ÖNB, Vind. slav. 146), undertaken by an interdisciplinary group of scholars and scientists from the Centre of Image and Material Analysis in Cultural Heritage (CIMA ‒ www.cima@or.at) within two Austrian Science Fund-projects devoted to the ancient Glagolitic heritage. The investigation consisted of four parts, codicological, multispectral, chemical and philological. While the codicological survey served to get as much information as possible about the writing material (source of parchment, methods of preparation, writing process, deletions, condition), color and multispectral recordings had been made to preserve the manuscript at its best and to provide an apt basis for further investigations. The chemical analysis was executed with two portable spectroscopes (XRF and rFTIR) and aimed to get exact information on the parchment, the inks, paints and binders, and to collect data for a comparative study of parchment degradation. The philologists analysed the fragment comparatively with all other Old Church Slavonic-Glagolitic manuscripts preserved to get as much information as possible about their scribes.
More...
The paper defines the elementary principles for creating an electronic corpus of Serbian medieval charters and letters. The commitment to the principle of maximum representativeness of the corpus of medieval charters and letters, determined entirely by the preserved written legacy (based on manuscripts, microfilms or photographs), excludes the indispensability of applying the principle of balance, while simultaneously satisfying the principle of reliability, since charters and letters known solely by the edition are not included in the corpus. The selection of texts is done according to the diplomatic criterion by excluding the transcripts and copies of documents already available in the original, as well as later transcripts, chronologically and linguistically distant from the assumed original. This approach to the selection of texts is justified by the size of the corpus, as well as by the exceptional cultural and historical significance of medieval charters and letters. The definition of the metadata about corpus texts is determined by their general diplomatic properties, as well as the corpus search needs for diatopic, diachronic and genre variations. Conversion of texts into electronic form strives for fidelity to the original, encompassing the preservation of abbreviations, superscript letters and original punctuation, as well as the absence of accent marks and contemporary rules of capitalization.
More...
The article discusses a proposal of a minimal set of criteria for sentence segmentation (an obligatory stage in the corpus processing and annotation, especially with respect to the syntactic annotation) of medieval texts. In the context of a review of different definitions of a sentence (unit) and approaches to sentence segmentation, various criteria are discussed: structural, thematic, graphic, on the basis of sample sentences in order to define the minimal criteria. The discussion of the different factors is illustrated by sample sentences from two texts from 14th and 17th c. The proposed criteria aim at considering mainly structural characteristics while trying to avoid textual and semantic interpretation though these can also present challenges because the interpretation of the (syntactic) structure is inevitably related to the interpretation of the (semantic) content.
More...
The St Petersburg Corpus of Hagiographic Texts (SCAT) has launched two new mark-up formats. The first innovation is the comprehensive format developed for the division of hagiographic texts into parts, which are both explicitly marked as section headings and extrapolated through comparison with texts of the similar genre. The second innovation is an elaborate format representing the full range of various types of biblical, patristic and liturgical quotations occurring in the lives of saints. For the time being, three morphologically annotated manuscript texts have been marked up according to these guidelines, and we are planning to add two more texts in the near future. Close cooperation with the IHRIM research laboratory (Lyon) and wide use of their techniques and technology makes it possible to obtain some illuminating cross-format statistical data and thus offer new insights into the canons and rules of the Old Russian hagiography.
More...
The neural network tagger CLStM has been applied to the Old Russian Žitie Evfimija Velikogo (GIM, Chud. 20), a copy of the second half of the 14th century. The strengths of this tagger consist in its ability to automatically annotate an orthographically non-normalized text with dozens of pages within a few minutes, yielding a high accuracy with respect to part of speech and morphological features. Moreover, the tagger is capable of disambiguating case syncretism to a large extent, even in split constructions. Manual correction of the automatic tagging will result in a correctly tagged text considerably faster than when using a rule-based tagger or tagging completely manually. The weaknesses of the CLStM-tagger comprise certain examples of incorrect POS-tagging, sometimes incomplete or incorrect attribution of morphological categories to some parts of speech. Superscript letters and punctuation can pose special problems, normalization of punctuation will achieve better tagging results. The proportion of correct tags is higher when the token has been seen during the training process; unknown words (OOV) show a higher error rate. In the paper, we analyze the strengths and weaknesses of the tagger by providing specific examples. Furthermore, we demonstrate how to use automatically tagged, uncorrected data for quantitative analysis.
More...
The paper presents results, including work in progress, related to two databases of “non-bookish” / vernacular Old East Slavic writing, viz. the databases of birchbark letters and epigraphy. The aim of the project is the interlinking of visual, archeological/historical and linguistic information. The epigraphical database represents different interpretations of a single inscription, providing the outline of versions proposed in the existing literature. These sources, an archeographical database and a linguistic corpus making part of a larger Russian National corpus, are intended to be easily synchronized, expanded, and updated. An online work station for the morphological annotation of texts is a part of this project. An important function performed by this platform is creating an index to the corpus that can be used in the linguistic description of the dialect, verifying the index and the data of the book Old Novgorod Dialect. Addenda by Andrei Zaliznjak that is being prepared for a posthumous publication. New linguistic discoveries have been made during the implementation of the project.
More...
The work demonstrates the methods and techniques of elimination of variation of linguistic units in the transcriptions of the medieval Slavonic manuscripts of the historical corpus “Manuscript” (manuscripts.ru). The textual corpus, the material of which is presented by the machine-readable copies which resemble the original most closely, provides the user with such tools of transformation (modification) of linguistic units which enable the creation of queries and getting of retrievals corresponding to the task to be solved. In case of an inexact search the user has the possiblity to delete titlos and diacritics, reduction of the versions of letters to their basic form, indication of the mask of the linguistic units being searched in the form of a regular expression, use of the letters of the contemporary Cyrillic alphabet. To ensure operations over lemmas by means of the statistic modules of the corpus, it is necessary to automatically assign a given textual form to exactly one lemma. Due to grammatical homonymy, incorrect lemmatization would result in a situation where quantitative data based on word forms and data based on lemmas do not match each other. In order to assign word forms to the correct lemma, we apply a rule-based approach, taking into account the formal and quantitative characteristics of the linguistic units (such as their morphological variation or invariation, their frequency in the sub-corpus, the matching or mismatching with the lemma form, the frequency of relationships between the textual forms and dictionary paradigms of variable words, the results of manual elimination of the homonymy). The reduction of textual forms to unified, normalized, transliterated or initial forms is a necessary procedure for extracting of data from the historical corpus for the distributive-statistical analysis of the semantics of linguistic units.
More...
The Internet is undeniably something that has changed people’s lives in many aspects, language included, leading to the appearance of what is called nowadays Netspeak. Even though there are numerous opinions regarding the emergence of this linguistic variety, it is without doubt that changes in language due to the Internet are a growing phenomenon. In this paper I will approach the issue of the emergence of Netspeak, the changes that appear in language in order to form this linguistic variety, and the issue of it as a worldwide phenomenon. We will see that, even though the English language undoubtedly dominates the Internet today, other languages as well have the potential to, and do go through similar changes and form an Internet linguistic variety of their own. The paper will tackle the issue of globalization in the context of the Internet seen as a social construct rather than a technological one, which helps people connect with each other, with the dominant language being the English language, as well as the issue of regionalization, as different language communities are shaping their own identities on the Internet.
More...
This article discusses an important type of public communication taking place in Lithuania – internet blogs written in Polish. The study presents how they function, outlines the reasons behind their creation, and analyses the topics which they concern. As this is a sphere of uncontrolled communication, the author makes an attempt to present various social attitudes and to analyse subjective opinions on cultural and identity issues. The linguistic ways of expressing them are also an equally important issue. In this study, we assume that the sphere of the internet, although so far little explored, can be a very good basis for research on the way ethnic identities are shaped. The analysis of the genres of texts posted on blogs or social networking sites indicates that they are characterised by rhetorical coherence, enter into various relationships with each other, creating a blogosphere, and have a dialogical nature. They are usually written in a general language with regional accretions. A frequent phenomenon is using these features as the hallmark of a group, a factor integrating the internet community. Due to the fact that the authors of these websites are usually young people, a common phenomenon is the colloquial linguistic marking of statements, striving to exaggerate certain events, using stylistically marked lexis. We are dealing here with the colloquial style preserved in the written version.
More...
This paper investigates the phenomenon of intensification from the point of view of semantics. Specifically, such intensifiers, which by their meaning specify the degree of the property, at the same time exhibit other semantic features. In communication, they can express, for example, what kind of feelings a given utterance evokes in the speaker. The analysis of intensifiers and their collocates is performed on Finnish material. Two groups of intensifiers are compared with the finding that the semantic features of the intensifiers themselves affect their collocability; but apparently synonymous intensifiers also have different semantic preferences.
More...
This study summarizes a corpus-based analysis of tendencies in register variation of Czech-written fiction texts in the period from 1992 to 2018. The analysis is based on projection of the results from a large sample of Czech prose texts (1070 texts, 12.7 mil. words) on a general register model (established by previous research using multidimensional analysis). The major tendencies found in the material are a decrease of cohesion level, addressee coding and retrospective narration, and increased polythematicity/lexical richness. These findings are supplemented by additional analyses of the role of translation, the position of a text excerpt in the original text (beginning, middle and end) and type of text in the results.
More...
This study aims to investigate the extent to which computational thinking can be developed through constructionism-based accounting spreadsheets activities. This study design used a mixed-method approach, namely a participatory qualitative approach and a quantitative descriptive approach. Data were collected through documentation (college students’ artefacts) and classroom observations. The results showed that constructionism-based accounting spreadsheets design can build and facilitate computational thinking development. The college students’ emotional and social engagement when executing a design plan can foster curiosity and high enthusiasm to complete the design together. This engagement can reduce the cognitive load that students feel in understanding programming languages when utilising visual basic for application excel. This study contributes and suggests to learning practitioners to improve the students’ quality so that they can compete in this digital era. This research can be used as a basis for conducting further research where researchers empirically investigate the impact of computational thinking development.
More...
Review of: Т. Avgustinova. Word Order and Clitics in Bulgarian [ Saarbrucken Dissertations in Computational Linguistics and Language Technology. Volume 5]. Saarbrucken, 1998. 184 p.
More...
This paper studies the statistical implicational universals in the 30 languages sample from the classical paper by Joseph Greenberg (1966). Some problems in the universals proposed by Greenberg are shown, as well as 43 previously undiscovered universals of this type. The whole text of the article was generated by the computer program UNIVAUTO (UNIVersals Authoring TOol) and only the formatting according to the style-sheet of the journal was manually added. A brief description of this program, as well as another article generated by it, were previously published by this journal (Contrastive Linguistics 1999, issue 4).
More...
The article is devoted to the linguistic ways of the depreciation of Ukraine as an independent state. The analyzed material allows us to conclude that the linguistic plane of the studied discourse reflects several ways of depreciation and delegit- imization of Ukraine as an independent state. In addition to the almost mechanical replacement of the name Украина with Малороссия, which is derived from a simple denial of Ukraine’s right to independent existence, there appear such units that express certain arguments characteristic of imperial discourse: about the lack of real independence of the state (e.g. филиал, укрпроект), about the illegality of the procedures for the election of its authorities (государственный переворот), their illegal, violent character (e.g. хунта, диктатура), about the chaos prevailing on the territory of Ukraine (мазепинская самостийность, самостихийность). A characteristic feature of the Orthodox variety of imperial discourse, on the other hand, is even an indirect reference to the essentially medieval religious argumentation by pointing to the non-Christian character of the Ukrainian authorities (безбожная власть).
More...
This paper analyzes internet memes pertaining to Covid-19. We analyzed more than 200 memes over nine months. By utilizing Blending Theory and Discourse Viewpoint, we attempt to explain the creative inner workings of memes as well as how meaning is negotiated on the internet. We were clearly able to detect memes synchronously following the actual development of Covid-19 We show that meme makers use visuals metonymically to address the current state of Covid-19 while the overall message of memes is driven by simile. As much as memes draw on the concept of Covid-19, they also feed back to it in a loop of self-reference. Along with their underlying metaphoric nature, memes convey a feels-like attitude with two main phases emerging from their usage, i.e., the Observer Phase and the Experiencer Phase. The former showed memes at a stage where Covid-19 was not yet a pandemic (but perceived through media coverage from elsewhere) while the latter, the Experiencer Phase, clearly showed that meme creators had experienced the virus themselves. As for the timeframe covered, however, we conclude that memes do not show full conceptual integration as Covid-19 was not yet fully entrenched.
More...
This article explores the connection between artificial intelligence (AI) and language learning in the context of Education 4.0, highlighting how the former revolutionizes the latter with the introduction of emerging technologies and innovations in education. The article discusses how AI improves the processes of language learning through personalized learning experiences, interactive practice, and automated assessment. AI can be used to create diverse learning materials and immersive experiences that align with the principles of Education 4.0. When used correctly, AI can bring numerous benefits to language learning, such as increased efficiency, greater student engagement in the teaching-learning process, and the accessibility of content from anywhere and on any device. Additionally, it emphasizes the need to adopt Education 4.0 accompanied by the development of content that equips students with the necessary skills in the digital age. The article also highlights the importance of integrating AI and Education 4.0 in language learning to promote critical thinking, problem-solving skills, and digital literacy.
More...
This article presents the project Assessing the Reading Literacy and Comprehension of Early Graders in Bulgaria and Italy, which is carried out as part of an international collaboration between two partner organisations – the Institute for Bulgarian Language Prof. Lyubomir Andreychin (BAS) with participants from the Department of Computa-tional Linguistics and the Institute for Computational Linguistics A. Zampolli in Pisa, Italy. The main goal of the project is to research and assess the reading skills of primary school students using modern language technologies.
More...