We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
(na primjeru imenice mreža)
Noun phrases in Croatian can differ in the degree of correlation between its constituents. Some constituents form a descriptive free word combinations (velik stol ʽlarge table’, sunčan dan ʽsunny day’, slatka kava ʽsweet coffee’, hladne ruke ʽcold hands’), while others form multiword units which concretize extra-linguistic content that can not be ex-pressed in one word (crna kava ʽblack coffee’, krevet na kat ‘bunk bed’, kreditna kartica ‘credit card’, radno mjesto ‘workplace’). Dependent constituents can be adjectives, which are congruent with a noun (velika soba ‘big room’, radno mjesto ‘working place’), or they can be adverb phrase or prepositional phrase (korak naprijed ‘step ahead’, mnogo ljudi ‘many people’, malo prijatelja ‘a few friends’, četkica za zube ‘toothbrush’, roba s greškom ‘faulty good’). This paper will analyze noun mreža (with reach syntagmatic and semantic potential) and its co-occurrences – they can either form a collocation or a free combination of words. The lexicographic description will be compared with the corpus-data. The analyses will take into consideration a list of computationally obtained collocates (collocation candidates) of a node noun. The frequency and the strength between the words occurring within a particular span can differ. The list of collocates obtained from the corpus will be taken into account and we will examine how it coincides with the existing lexicographic description and with theoretical principles of word combination interpretations in Croatian. The aim of the study is to determine how the corpus analysis can improve the treatment of word-combination entries in lexicographic work.
More...
In the third century one part of elite of the ancient Japanese society adopted Chinese writing and began to learn it. It is assumed that at the beginning Japanese read Chinese characters following the sound patterns of the ancient Japanese language approximating the Chinese sounds. However, Japanese sounds applied the Chinese characters, and at the same time the word order was changed into Japanese word order. This was the beginning of kanbun kundoku, or Chinese writing with Japanese readings. The term ‘Japanese readings’ is used here in the sense of both: to read each individual character as a Chinese character, or, to read them replacing the word order of Chinese writing into a Japanese translation. When Chinese characters were adopted for use in Japan, they were at first read as Chinese sounds with a Japanese pronunciation approximating that of the Chinese reading. Thereafter, this type of Japanese translation for individual readings of Chinese characters known as ‘kundoku’ began. ‘Kundoku’ (reading characters with their Chinese pronunciations) is still used today along with ‘ondoku’ for reading Chinese characters used in Japanese, i.e. in ‘kanbun kundoku’. This first reading is important in the history of modern Japanese translation. The reason is that when Japanese first encountered western languages, this method of Chinese translation readings was used for English translation, French translation, and so on. In short, Japanese people created another style of written Japanese for translation, dating back to Chinese writing system, apart from the traditional ancient Japanese language system. In Japan, however, after Chinese characters were introduced from China, Japanese created a style of native Japanese readings. Japanese translators have translated naturally according to their own logic and style.
More...
Knowledge of people’s health information needs and information behaviour can be used in planning health interventions in a way that they would meet people’s needs as accurately as possible and reflect how health information is acquired and processed. Aim. The aim of the study presented in this paper was to analyze the usefulness of online forums as a source of scientific knowledge about people’s health needs and information behaviour, which could then be actively used in the area of public health. Method. The content, a total of 1,575 entries, derived from two open forums devoted to depression in the years 2012–2015 was analysed using a set of mixed methods, including: a formal (quantitative) analysis of the material using the tools of computational linguistics (QDAMiner Simsat), inductive theme analysis EMIC, in the so-called hard variety, reinforced by elements of Awdiejew’s conversational grammar, and comparative method. Results. Both health information needs and behaviour can be identified on Internet forums dedicated to health problems. Linguistic analysis of online forums can give very interesting results and clues that cannot be obtained using questionnaires or personal interviews. It seems, however, that it should never be the only method used in investigating this matter. Since there are several intervening factors that may distort reliability of findings, determining whether we are dealing with real or created needs or behaviour requires confirming the results of the linguistic analysis of the forums using other methods.
More...
The importance of terminology in the era of globalisation stems from the fact that nowadays the rapid growth of scientific and technological knowledge is practically impossible without paying attention to the state of terminology. Special lexical units comprise more than 90 percent of the new words in modern langtrages. The growth of scientific and technical vocabularies is much faster than that of the everyday speech vocabulary, so at present the number of terms in some sciences (for e,rample chemistry or biology) exceeds the number of common words. We can compare the following figures: the full, unabridged version of the Webster's dictionary contains about 700,000 English words; the largest Russian 17-volume dictionary treats some 120,000 words (though already there are dictionaries containing ca. 200,000 words); at the same time Russian construction terminology numbers more than 150,000 words, modern biological terminology exceeds a million names for varieties of living beings, in chemistry we know more than 1,5 million substances. In industry already at the beginning of the 1980s more than 20 million types of various products were manufactured, each of them having its own special name.
More...
In my article I will present the language situation in Sweden and the Swedish centre of Terminology TNc, the „hub“ in Swedish terminology work for more than 65 years. Terminology work in Sweden is also performed in a systematic way by others: some public agencies and private companies handle their own terminology needs, but still in close contact with TNc. however, a new terminology market has evolved where other consultants than TNc are offering terminology services, and this will also be mentioned.
More...
The article deals with Genus-Species and Whole-Part relations between terms of linguistics as evidenced by their definitions in one of the most habitual student’s books on linguistics. A procedure of Genus-Species analysis is specified to parse Definiens wording into Genus wording and Differentia Specifica wording(s). The resulting data of Genus-Species and Whole-part analysis have been entered into computer Terminology Knowledge Base to present the Genus-Species and Whole-Part structure of linguistic terminology in computer form which enables the Knowledge Base consumer to navigate in the respective structures and to find every particular term at its own level of these structures.
More...A contrastive corpus-based study
The English verbal construction fail to x allows two interpretations: in the first, the verb has the full lexical meaning of not being successful in what you are trying to achieve, whereas in the second, it shows signs of semantic bleaching, and is thus interpreted as a grammaticalized marker of negation. Taking into account the syntactic and semantic properties of the construction fail to x, the present analysis examines its distribution in two types of corpora. General corpora (the British National Corpus and the Corpus of Contemporary American English) are used to examine the distribution of both – the non-bleached and bleached – meanings in English. To further elaborate the findings and contrast them on a cross-linguistic level, the parallel English-Slovenian corpus (European Commission’s DGT Translation Memory) is used to observe the translations of the construction fail to x into Slovenian. The parallel corpus of legislative language demonstrates the impact of register on the use of fail to x, and addresses the claims that the bleached fail is characteristically found in more formal reg-isters.
More...
Corpus studies show that in some languages, English being a case in point, verbal idioms exhibit the greatest degree of variation, with the verb being the most variable component. Unsurprisingly, monolingual dictionaries of English and other languages tend to list verb substitutions. Croatian dictionaries adopt the same practice with variable consistency: they list all the lexical variants in certain entries, and only some variants in others. In the latter case, abbreviations such as i sl. ‘and similar’ and itd. ‘etc.’ are used. Moreover, in cases where verb variability is indicated in the entry, no illustrations of the use of the other verbs are provided. All this may present difficulties for users, with the use of itd. ‘etc.’ being particularly problematic, because it signals that the variable verb may be replaced by (presumably any) semantically unrelated verb. The conventionality/innovativeness of idiom variants may also present a problem for dictionary users, because the dictionaries are not consistent in providing examples of use, or indicating creative uses.The aim of this paper is to show that the number of verb substitutions in idioms is limited, which is not signalled by abbreviations such as i sl. ‘and similar’. Furthermore, we will show that the choice of verb substitutions is not completely free, but depends on conceptual motivation, which has important consequences for their lexicographic treatment. We extracted 187 verbal idioms containing the abbreviations i sl. and itd. from the Croatian Dictionary of Idioms (2014) and tested their use in the hrWaC corpus. The results show that there are significant differences between variant forms in frequency and use, and that some idioms occur in forms and meanings which are not listed in the dictionary. Based on this, we argue that conventionalized lexical variants should be listed in the dictionary to enhance users’ idiom comprehension and confidence of use. Based on our data and current lexicographic practice in other languages, we propose several possibilities of treating idioms with variable verbal components in monolingual dictionaries of Croatian.
More...
This paper is a write-up of a keynote from El’Manuscript 2021, reflecting on the ways in which the field of computationally-supported medieval Slavic studies has and has not changed since the mid-2000’s. Looking towards developments in the broader fields of digital humanities and natural-language processing, it explores the ways that recent improvements in the tools at our disposal for mass digitization of manuscripts and text analysis at scale open up possibilities for working with manuscripts that have received very little attention. For these advancements to be feasible, however, scholars will need to prepare and share their digitized texts and annotations in ways that are not currently the norm, though a number of projects provide exemplary models of how these new conventions could be put into practice.
More...
Digital annotation of verbal aspect in Old Russian and Church Slavonic texts is a challenging and quite complicated task that requires a complex approach. While studying Slavic aspect systems synchronically, we always know whether the verb is perfective, imperfective or biaspectual, however, this is often not the case for the research of aspect in a diachronic perspective. The determination of the aspectual status of a particular verb for earlier stages is possible only after considering together different parameters such as: actionality, lexical semantic, morphology, functional distribution, syntactic restrictions, collocations, statistics etc. All essential parameters should be annotated sufficiently for an effective use of a corpora. That would enable a researcher to collect quickly the information necessary to build aspectual profile of a verb. It is also important to understand the hierarchy of the parameters, as they might have different degrees of importance, and for this purpose a special algorithm should be developed. The preliminary results, related to the parameters of annotation and the algorithm for aspect determination (using ‘Morphy’, the System for digital morphological annotation of Old Russian and Church Slavonic manuscripts, developed in Vinogradov Russian Language Institute RAS), are discussed in the paper.
More...
The article presents basic principles of designing the diachronic linguistic corpus of documents of the Don Cossack Host offices from the State Archive of the Volgograd region, Russia, including collecting documents for the text corpus, arranging the technical base of automatic processing and text editing, scheduling automated tagging, morphological annotation, and corpus software tools. The authors explain some technical aspects of corpus processing and text corpus constituency. It is considered reasonable to add any document to the corpus, the draft texts with the crossed-out fragments included, as it ensures accurate registration of grammar and vocabulary of the language at a certain historical period. A set of language marker types is worked over for automated meta-tagging. The corpus software tools are defined to enable accurate annotation of obsolete fonts so that they can be processed in a pair with regular language units and expressions in morphological and genre meta-tagging; in cases of partial text adaptation, the authentic old graphic symbols may have to be preserved.
More...
In cases where there is a larger collection of manuscripts, the scribe or author of which is unknown or in doubt, analyzing such manuscripts can take a lot of time and effort. The more pages and potential writers are involved, the more complicated it is to get tangible results. LiViTo is a free tool2 that requires a minimum of experience with the command line and allows a simplified search for keywords, revisions, and clustering of historical manuscripts. We present the application of LiViTo on the “lab case” of the biographies of Czech Protestant refugees from the 18th–19th century. Most of these manuscripts include stories of farmers’ and craftsmen’s families who fled to Berlin because of their religious beliefs. The examination of this type of biographies and manuscripts using the methods of Digital Humanities takes place for the first time for Czech. Using extracts from the research project in which LiViTo was developed, individual functions of the tool are explained. In addition, individual findings relating to the manuscripts and the potential further development of the tool are presented.
More...
The article deals with various efforts of the Staatsbibliothek zu Berlin (SBB) to make its collection of about 250 Church-Slavic prints from the 17th to the 19th century accessible in terms of content using the methods of modern information technology from the Digital Humanities sector. The focus is on full-text indexing of the heterogeneous Church Slavonic prints using HTR+ language models from the programme Transkribus. Depending on whether they are Moscow, Kiev or Old Believer prints, these models require different approaches and corresponding adaptations that take into account the printing area and printing period. Prints such as Kirillova kniga (1644) or Gistorija Ioanna Damaskina (1637) and many others are processed at large scale, whereby the developed character recognition models are constantly refined by training new verified data. The full texts generated in this way are permanently stored in various XML formats (ALTO, PAGE) on the one hand in a central repository for subsequent use, and on the other hand they are merged with original digital copies in the IIIF-compatible Digital Library of the SBB. As a further element, the Church Slavonic full texts will be indexed using special SOLR analyzers for efficient searches (Tokinising, Translit, N-Grams) and made searchable in subject portals (including the Slavistik-Portal) using modern text-image web design.
More...
The paper discusses some results obtained as part of an ongoing project at the Slavic Institute of Heidelberg University to produce automatic transcriptions of an early 18th century trilingual printed dictionary (Fedor Polikarpov’s Leksikon trejazyčnyj) and, on a preliminary basis, of a 17th century trilingual manuscript (Epifanij Slavineckii’s working copy of his Greek–Slavic–Latin dictionary) using the handwritten text recognition (HTR) platforms Transkribus and eScriptorium. It is argued that there are considerable advantages to employing such tools in terms of the simplification and acceleration of work on multilingual edition projects. Moreover, a comparison of our experience working with Transkribus and eScriptorium is given, along with an overview of the practical benefits and challenges of working with each of these platforms.
More...
We report on applying Handwritten Text Recognition (HTR) to manuscripts from the archive of Konstantin Rychkov preserved at IOM RAS, St. Petersburg, within the INEL project. Folklore texts in Evenki (Tungusic) were collected in Western Siberia in 1910s. We used services provided by the Transkribus platform. The necessary step of Layout Analysis proved to be time-consuming due to the organization of the parallel Evenki-Russian text on the page without following a strict separation line. HTR models have been trained successively on different amounts of data up to 521 pages. The best Character Error Rate attained on validation data for the largest dataset is 4.50% for models trained on all characters. The distribution of errors is non-uniform: most errors are due to just a few problematic issues, especially diacritics such as the accent marking stress. It is written high above the line and frequently cut off from the line images at the preprocessing stage. After excluding the stress mark from training data and recognition, the lowest CER dropped to 2.90%. We compared two recognition engines, HTR+ and PyLaia. The HTR+ model trained without stress marks made less errors in letters, while PyLaia performed better with respect to diacritics.
More...
The author compares the marginal glosses in the book of Epifanij Slavinetskij’s Sbornik perevodov, 1665, with the text of Athanasius’ Third Oration against the Arians in Gavrilo Venclović’s Razglagolnik, 1734. The marginal glosses in Epifanij’s Russian Version are taken from a South Slavonic manuscript that has a common origin with the protograph of Venclović. The Orationes contra Arianos in Razglagolnik are written in South Slavonic koine and their source has the features of an Athonite translation related to the Council of Ferrara-Florence and the disputes over the filioque.
More...
The text transmission of the Slavonic translation of Hippolytus’ De Christo et Antichristo presents a stable and well-testified tradition. It gives a base for possible reconstruction of the Greek original from which this translation was made. The article demonstrates some omissions, additions, and reconstructions on the Greek text compared to the Slavonic one. Also, the paper addresses significant problems that occur in the scholars’ work on bilingual dictionaries discussing possible approaches and solutions. Still, some questions remain, and it is not easy to suggest a definite answer to them. The author underlines the importance of the fragmentary copy of the Greek text, presented in the manuscript of Meteora 573, bearing in mind its significant correspondence to the Slavonic tradition. Unfortunately, this manuscript preserves only trifling fragments of the whole work by Hippolytus of Rome.
More...
The article focuses on Old Slavonic versions of Euthalian chapter-lists to Acts and Epistles considering meta-communicative terms, such as παραίνεσις or προοίμιον. The author aims to evaluate the level of accuracy of Slavonic translations and their exegetical potential, which makes the content of the main text of Acts and Epistles clear. The analysis reveals two tendencies prevailing in Slavonic sources from the 12th–16th centuries: first, there are phenomena of lexical variability, as results of applying various translation strategies, more or less successful in terms of the accuracy and clarity of the resulting text (calques, periphrastic constructions, and text expansion). Second, there is a tendency towards unification, suggesting a universal Slavonic term for several Greek correlates. Authoritative dictionaries, including academic ones, do not record some lexemes. There is no dependence of the chapter-lists lexicon on the main text vocabulary.
More...
The focus of this report is the still-unexplored Interpretation of Orthodox liturgy, attested in two copies: first in manuscript No. 88 from the collection of Obolensky (201), State Archive of Russian Federation (Moscow), the second in manuscript No 52 of 1567, from the Archive of Baltazar Bogisić in Cavtat. The two manuscripts contain proven original works of Constantine of Kostenets (1380–1431). The author analyzes the structure and content of the interpretation and comments on it as a source for the history of Liturgy – from the point of view of the data concerning the liturgical features described in it. It can be concluded that the basis of texts in MS No 88 and MS Bogishić 52 is a late composition of Byzantine mystagogy, which, in turn, means that the time of implementation of the South Slavic translation should be dated no earlier than the end of the 12th century. This is one of the many short epitomes created during the Second Bulgarian Kingdom as a result of the secondary reduction of the original extensive commentary. A detailed investigation and the text-critical edition will be forthcoming.
More...