We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
The requirements for historical corpora of medieval texts 1) are determined by properties of the data and the historical-linguistic, textological and linguo-textological tasks to be solved; 2) and should be realized with the help of special tagging, processing procedures, query parameters and retrieval demonstrations. The corpus should a) have metadata concerning both texts and manuscripts, and involving both linguistic and analytical tagging; b) support the rendering of documents (facsimile and transcription), concordances, lists, and comparison of subcorpora data; c) simplify graphic-orthographic variation during data search and visualization; d) provide tools both for processing and searching linguistic material and its further analysis according to traditional methods; and e) support problem description and resolution by applying corpus methods that engage with the quantity, distribution, co-occurrence, and variation of linguistic units in big data arrays. The realization of these requirements is demonstrated on a subcorpus of three copies of chronicles (Laurentian, Hypatian, Radzivilovsky) from the historical corpus project “Manuscript” (manuscripts.ru).
More...
The article is devoted to the analysis of the textual and linguistic features of the penitentiary texts contained in two Kormčie: the Novgorod Kormčaja 1282 (GIM, Synodal Collection, № 132) and Ustjug Kormčaja (RSL, Rumjancev Collection № 230, 13th–14th century). The penitential texts “The Typikon of the Monastery of St John of Pantelleria”, the penitential rules of Basil the Great and of Theodore Studite (“O ostanceh cerkovnyh”) and “Rules for Monks” are examined. It is noted that all these texts were given in both Kormčie in the same translations; however, their texts were edited to various degrees, including language. It is concluded that, when examining a Kormčaja, it is necessary to take into account the heterogeneity of its composition and to carry out the analysis not comprehensively, but article by article. As a result of the study, it can be said that the texts of some penitential texts are presented in an older version in the Ustjug Kormčaja, and others indicate the archaic version in the Novgorod-Varsonof’ev group.
More...
The article deals with textual features of five Menologies as a part of so-called Great Horologia of the 15th century. The division of these Menologies into two groups is established. The first group reflects the features of the Menologies that go back to the Russian translation of the Jerusalem Typicon of the middle of the 14th century. The manuscripts of this group come from the greatest monastic centers, namely the Trinity Lavra of St Sergius and the St Joseph’s Monastery near Volok Town. They show that the older Russian translation of the Jerusalem Typicon, in its complete form known in the only two manuscripts, had a greater distribution in Russia than was previously believed. The Menology of the printed Horologion published by Schweipolt Fiol in 1491 attests to the prominence of the older Russian translation of the Jerusalem Typicon in Western Russia as well. The second group goes back to the late Russian version of the Jerusalem Typicon of 67 chapters. This version supersedes the Menology of the older Russian translation of the Jerusalem Typicon in the 16th century.
More...
The article discusses a proposal of a minimal set of criteria for sentence segmentation (an obligatory stage in the corpus processing and annotation, especially with respect to the syntactic annotation) of medieval texts. In the context of a review of different definitions of a sentence (unit) and approaches to sentence segmentation, various criteria are discussed: structural, thematic, graphic, on the basis of sample sentences in order to define the minimal criteria. The discussion of the different factors is illustrated by sample sentences from two texts from 14th and 17th c. The proposed criteria aim at considering mainly structural characteristics while trying to avoid textual and semantic interpretation though these can also present challenges because the interpretation of the (syntactic) structure is inevitably related to the interpretation of the (semantic) content.
More...
The St Petersburg Corpus of Hagiographic Texts (SCAT) has launched two new mark-up formats. The first innovation is the comprehensive format developed for the division of hagiographic texts into parts, which are both explicitly marked as section headings and extrapolated through comparison with texts of the similar genre. The second innovation is an elaborate format representing the full range of various types of biblical, patristic and liturgical quotations occurring in the lives of saints. For the time being, three morphologically annotated manuscript texts have been marked up according to these guidelines, and we are planning to add two more texts in the near future. Close cooperation with the IHRIM research laboratory (Lyon) and wide use of their techniques and technology makes it possible to obtain some illuminating cross-format statistical data and thus offer new insights into the canons and rules of the Old Russian hagiography.
More...
The neural network tagger CLStM has been applied to the Old Russian Žitie Evfimija Velikogo (GIM, Chud. 20), a copy of the second half of the 14th century. The strengths of this tagger consist in its ability to automatically annotate an orthographically non-normalized text with dozens of pages within a few minutes, yielding a high accuracy with respect to part of speech and morphological features. Moreover, the tagger is capable of disambiguating case syncretism to a large extent, even in split constructions. Manual correction of the automatic tagging will result in a correctly tagged text considerably faster than when using a rule-based tagger or tagging completely manually. The weaknesses of the CLStM-tagger comprise certain examples of incorrect POS-tagging, sometimes incomplete or incorrect attribution of morphological categories to some parts of speech. Superscript letters and punctuation can pose special problems, normalization of punctuation will achieve better tagging results. The proportion of correct tags is higher when the token has been seen during the training process; unknown words (OOV) show a higher error rate. In the paper, we analyze the strengths and weaknesses of the tagger by providing specific examples. Furthermore, we demonstrate how to use automatically tagged, uncorrected data for quantitative analysis.
More...
The paper presents results, including work in progress, related to two databases of “non-bookish” / vernacular Old East Slavic writing, viz. the databases of birchbark letters and epigraphy. The aim of the project is the interlinking of visual, archeological/historical and linguistic information. The epigraphical database represents different interpretations of a single inscription, providing the outline of versions proposed in the existing literature. These sources, an archeographical database and a linguistic corpus making part of a larger Russian National corpus, are intended to be easily synchronized, expanded, and updated. An online work station for the morphological annotation of texts is a part of this project. An important function performed by this platform is creating an index to the corpus that can be used in the linguistic description of the dialect, verifying the index and the data of the book Old Novgorod Dialect. Addenda by Andrei Zaliznjak that is being prepared for a posthumous publication. New linguistic discoveries have been made during the implementation of the project.
More...
The work demonstrates the methods and techniques of elimination of variation of linguistic units in the transcriptions of the medieval Slavonic manuscripts of the historical corpus “Manuscript” (manuscripts.ru). The textual corpus, the material of which is presented by the machine-readable copies which resemble the original most closely, provides the user with such tools of transformation (modification) of linguistic units which enable the creation of queries and getting of retrievals corresponding to the task to be solved. In case of an inexact search the user has the possiblity to delete titlos and diacritics, reduction of the versions of letters to their basic form, indication of the mask of the linguistic units being searched in the form of a regular expression, use of the letters of the contemporary Cyrillic alphabet. To ensure operations over lemmas by means of the statistic modules of the corpus, it is necessary to automatically assign a given textual form to exactly one lemma. Due to grammatical homonymy, incorrect lemmatization would result in a situation where quantitative data based on word forms and data based on lemmas do not match each other. In order to assign word forms to the correct lemma, we apply a rule-based approach, taking into account the formal and quantitative characteristics of the linguistic units (such as their morphological variation or invariation, their frequency in the sub-corpus, the matching or mismatching with the lemma form, the frequency of relationships between the textual forms and dictionary paradigms of variable words, the results of manual elimination of the homonymy). The reduction of textual forms to unified, normalized, transliterated or initial forms is a necessary procedure for extracting of data from the historical corpus for the distributive-statistical analysis of the semantics of linguistic units.
More...
The article deals with various efforts of the Staatsbibliothek zu Berlin (SBB) to make its collection of about 250 Church-Slavic prints from the 17th to the 19th century accessible in terms of content using the methods of modern information technology from the Digital Humanities sector. The focus is on full-text indexing of the heterogeneous Church Slavonic prints using HTR+ language models from the programme Transkribus. Depending on whether they are Moscow, Kiev or Old Believer prints, these models require different approaches and corresponding adaptations that take into account the printing area and printing period. Prints such as Kirillova kniga (1644) or Gistorija Ioanna Damaskina (1637) and many others are processed at large scale, whereby the developed character recognition models are constantly refined by training new verified data. The full texts generated in this way are permanently stored in various XML formats (ALTO, PAGE) on the one hand in a central repository for subsequent use, and on the other hand they are merged with original digital copies in the IIIF-compatible Digital Library of the SBB. As a further element, the Church Slavonic full texts will be indexed using special SOLR analyzers for efficient searches (Tokinising, Translit, N-Grams) and made searchable in subject portals (including the Slavistik-Portal) using modern text-image web design.
More...
We report on applying Handwritten Text Recognition (HTR) to manuscripts from the archive of Konstantin Rychkov preserved at IOM RAS, St. Petersburg, within the INEL project. Folklore texts in Evenki (Tungusic) were collected in Western Siberia in 1910s. We used services provided by the Transkribus platform. The necessary step of Layout Analysis proved to be time-consuming due to the organization of the parallel Evenki-Russian text on the page without following a strict separation line. HTR models have been trained successively on different amounts of data up to 521 pages. The best Character Error Rate attained on validation data for the largest dataset is 4.50% for models trained on all characters. The distribution of errors is non-uniform: most errors are due to just a few problematic issues, especially diacritics such as the accent marking stress. It is written high above the line and frequently cut off from the line images at the preprocessing stage. After excluding the stress mark from training data and recognition, the lowest CER dropped to 2.90%. We compared two recognition engines, HTR+ and PyLaia. The HTR+ model trained without stress marks made less errors in letters, while PyLaia performed better with respect to diacritics.
More...
The author compares the marginal glosses in the book of Epifanij Slavinetskij’s Sbornik perevodov, 1665, with the text of Athanasius’ Third Oration against the Arians in Gavrilo Venclović’s Razglagolnik, 1734. The marginal glosses in Epifanij’s Russian Version are taken from a South Slavonic manuscript that has a common origin with the protograph of Venclović. The Orationes contra Arianos in Razglagolnik are written in South Slavonic koine and their source has the features of an Athonite translation related to the Council of Ferrara-Florence and the disputes over the filioque.
More...
The text transmission of the Slavonic translation of Hippolytus’ De Christo et Antichristo presents a stable and well-testified tradition. It gives a base for possible reconstruction of the Greek original from which this translation was made. The article demonstrates some omissions, additions, and reconstructions on the Greek text compared to the Slavonic one. Also, the paper addresses significant problems that occur in the scholars’ work on bilingual dictionaries discussing possible approaches and solutions. Still, some questions remain, and it is not easy to suggest a definite answer to them. The author underlines the importance of the fragmentary copy of the Greek text, presented in the manuscript of Meteora 573, bearing in mind its significant correspondence to the Slavonic tradition. Unfortunately, this manuscript preserves only trifling fragments of the whole work by Hippolytus of Rome.
More...
The article focuses on Old Slavonic versions of Euthalian chapter-lists to Acts and Epistles considering meta-communicative terms, such as παραίνεσις or προοίμιον. The author aims to evaluate the level of accuracy of Slavonic translations and their exegetical potential, which makes the content of the main text of Acts and Epistles clear. The analysis reveals two tendencies prevailing in Slavonic sources from the 12th–16th centuries: first, there are phenomena of lexical variability, as results of applying various translation strategies, more or less successful in terms of the accuracy and clarity of the resulting text (calques, periphrastic constructions, and text expansion). Second, there is a tendency towards unification, suggesting a universal Slavonic term for several Greek correlates. Authoritative dictionaries, including academic ones, do not record some lexemes. There is no dependence of the chapter-lists lexicon on the main text vocabulary.
More...
The focus of this report is the still-unexplored Interpretation of Orthodox liturgy, attested in two copies: first in manuscript No. 88 from the collection of Obolensky (201), State Archive of Russian Federation (Moscow), the second in manuscript No 52 of 1567, from the Archive of Baltazar Bogisić in Cavtat. The two manuscripts contain proven original works of Constantine of Kostenets (1380–1431). The author analyzes the structure and content of the interpretation and comments on it as a source for the history of Liturgy – from the point of view of the data concerning the liturgical features described in it. It can be concluded that the basis of texts in MS No 88 and MS Bogishić 52 is a late composition of Byzantine mystagogy, which, in turn, means that the time of implementation of the South Slavic translation should be dated no earlier than the end of the 12th century. This is one of the many short epitomes created during the Second Bulgarian Kingdom as a result of the secondary reduction of the original extensive commentary. A detailed investigation and the text-critical edition will be forthcoming.
More...
Christian hagiographic literature chooses as its heroes’ people whose feeble flesh is in stark contrast to the greatness of the spirit. The saint is endowed with supernatural knowledge, works miracles, foretells, protects and heals people. The author discusses the healing practices in the Slavic translation of the Vita of St. Gregory of Agrigentum, as presented in a 15th-century copy of a Reading Menaion composed according to the Jerusalem typikon (Tărnovo type orthography, Moldavian provenance, kept in the Dragomirna monastery (Drag 706/1795), Romania). St. Gregory of Agrigentum is a senior clergyman and the goal of his healing skills is to show God’s grace and the power of Christian teaching from one side and the authority of his position as bishop from the other.
More...
The paper provides an overview of current issues concerning the metalinguistic inventory used in contrastive investigations of contemporary English and Serbian. Modern contrastive linguistics (CL) has largely shifted its methodological focus from the elaboration of theoretical prerequisites towards matters connected with the electronic processing of large amounts of linguistic data. Consequently, a need to revisit the problems of terminological discrepancies found in different frameworks used for the description of the compared languages is deemed appropriate. Problems arise on at least four levels: 1. restrictions imposed by the structure of the two languages com- pared; 2. the model-specific use of particular terms; 3. a semantically associative, but potentially misleading interpretative potential of linguistic terms; 4. the inconsistent or underspecified use of the metalinguistic units pertaining to a particular level of linguistic analysis or respective linguistic traditions. Having investigated the observed pitfalls, a conclusion about the necessity for a more precise determination of CL metalinguistic apparatus and a possible meeting ground to overcome the obstacles by means of corpus linguistics is presented.
More...
The aim of the paper is to investigate the possibilities of combining different aspectual categories in the same sentence in French and the semantic effects resulting from such combinations. We examine the possibilities and consequences of combining the perfective and imperfective aspectual perspective, on the one hand, with four types of situations as carriers of the lexical aspect, on the other hand. The theoretical framework of the work is represented by the theory of two-component aspectuality. The research material consists of a corpus of examples mostly from the literary French language. It was established that all four types of situations, activities, states, accomplishments and achievements, can be combined with both the perfective and the imperfective aspectual perspective, with significant consequences on the semantic level since different meanings and semantic and stylistic nuances are generated. It often happens that the syntactic environment of the verb affects the change of the type of situation, so in the overall interpretation of aspectual meanings, the complements of the verb and sentence clauses must be taken into account. This paper sheds light on the complex issue of the combination of syntax and semantics in the French language, which manifests itself in highlighting certain phases of the situation or the entirety of its interval, and contributes to the clarification of not so simple questions for all those who teach and learn the French language.
More...
This paper deals with the analysis of idioms containing the lexeme “eye” in the Italian, Spanish and Serbian languages. Since somatic idioms, especially those with the lexeme “eye”, represent a large part of the phraseological fund of three languages, only those idioms that express attention/caution were analyzed. Italian was the source language, whereas Spanish and Serbian are the target languages. The initial hypothesis was that two typologically related languages (Italian and Spanish) would share more common phraseological characteristics in comparison to typologi- cally unrelated language (Serbian).In order to prove the hypothesis from general and phraseological dictionaries, as well as from electronic sources, a research corpus was excerpted. Using the method of contrastive analysis, similarities and differences were identified from lexical, semantic and morphosyntactic aspects in order to establish the type of equivalence between the idioms at the interlingual level. The type of equivalence was determined for 22 idioms containing the lexeme “eye” in Italian, 19 in Spanish and 17 in Serbian. By contrasting the Italian and Spanish idioms, it can be concluded that there is absolute equivalence between 10 idioms, 5 of them form a relationship of partial equivalence with morphosyntactic differences, 2 a relationship of partial equivalence with lexical differences, 1 a relationship of partial equivalence with morphosyntactic and lexi- cal differences, zero equivalence is established between 3 idioms, and 1 false friend isidentified. On the other hand, the comparison of Italian and Serbian idioms gave the following results: 7 absolute equivalents, 4 partial equivalents with morphosyn- tactic differences, 5 partial equivalents with lexical differences, 1 partial equivalent with morphosyntactic and lexical differences, while zero equivalence was recorded in 5 cases, and there were no false friends. It can be seen that in both Spanish and Serbian, the highest percentage of idioms are absolute equivalents with their Italian correspondent, although that percentage is lower in Serbian than in Spanish.Therefore, it can be concluded that the initial hypothesis has been proven, ie. thattypologically related languages share more common phraseological characteristics. However, the number of partial equivalents between Italian and Serbian is very high. This can be explained by the fact that somatic idioms, due to the universality of bod- ily experience, reveal similarities between languages that do not belong to the same language family. The existence of universal human knowledge and associations about body parts explains the existence of somatisms that have the same or similar structure and meaning in different languages. On the other hand, although the human body has the same functions in all languages, their symbolism is culturally determined (Kovačević, 2012: 16-17), and that could explain the existence of zero equivalence and false friends at the interlingual level.
More...
This paper is a conceptual supplement sui generis, the aim of which is to present a modified treatment of proper names and deproprial expressions in the Academic Dictionary of Contemporary Czech (ADCC). The focus of the study is both a reflection on the past and current lexicographical practice and a discussion of the key issues related to the treatment of the respective lexical subsystem in the dictionary. First, we summarize the basic facts concerning the lexicographic processing of proprial and deproprial lexical units in the field of explanatory lexicography. Second, we provide some more general information about the ADCC and, most importantly, about the macrostructure and microstructure of the dictionary in reference to the topic of the present study. We focus on the inclusion of proprial and deproprial entries in the ADCC and the specific treatment of proper names contained in phrasemes. Special attention is paid to the microstructure of proprial and deproprial entries, as well.
More...