
4th International Conference on Corpus Linguistics (CILC2012) 22.–24. března 2012
http://www.cilc2012.es
More...We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
http://www.cilc2012.es
More...
This text is a report from the international conference "Corpora in German Linguistics: Oral, Written and Multimedia", organized by the Leibniz Institute for the German Language and held online on March 15–17, 2022.
More...
This paper presents a digital edition of the manuscript of the first Russian translation of Leprince de Beaumont’s The Beauty and the Beast fairy tale (1756), aligned to its French original. The translation was made in 1758 by a twelve year-old girl, Khionia Demidova (1746-1792), and dedicated to her elder brother. Its original manuscript is conserved at the scientific library of Saratov State University (no. 456). This document is interesting from several points of view: the “naive” translation made by a young girl allows us to understand how the French literature was perceived in the 18th century Russia, what aspects of the French language and socio-cultural phenomena of the Western Europe were difficult to understand, and how the socio-cultural phenomena of the Western Europe were perceived. The peculiarities of Khionia’s spelling and punctuation provide data on her knowledge of Russian grammar and orthography. The digital edition includes a multi-layer transcription of the source document aligned with a digital fac-simile and the original French text. It is published online on the TXM-IHRIM web portal (https://txm-ihrim.huma-num.fr). The workflow of the edition Microsoft Word, Oxgarage and TXM may be reused for similar editions and text corpora.
More...
The greatly examined story of A Lost Lady usually depicts Mrs. Forrester’s success in meeting and adapting to the challenges of a changing world, a world characterized by materialism and self-fulfilment. However, the overlooked story, one far more disturbing than the privileged story in the text, is the narrative of oppressed groups of people of other races and the lower class. Drawing on some aspects of postcolonial theory, this paper explores Willa Cather’s own reactions to real changes in her society, to the waning power of imperialism, and of her nostalgic longing for the western prairies of her youth, without showing any sympathy for the dispossessed Native Americans and other oppressed races. It will also disclose the unmistakable colonial overtones, which remarkably resonate with the common discourse of “Manifest Destiny” during the time period of American expansion to the Wild West.
More...
In this paper we present a mixed-principle rule-based approach to the automatic syllabification of Serbian, based on prescriptive rules from traditional grammar in combination with the Sonority Sequencing Principle. We explore the problems and limitations of the existing rule set and sonority-based approaches, introduce an algorithm that utilizes both means in an attempt to produce a more accurate segmentation of words into syllables that is better aligned with the intuition of the native speakers, and present the statistical data related to the distribution of syllables and their structure in Serbian.
More...
Zero anaphora is an element of the coreference resolution task that has not yet been directly addressed in Polish and, in most studies, it has been left as the most challenging aspect for further investigation. This article presents an initial study of this problem. The preparation of a machine learning approach, alongside engineering features based on linguistic study of the KPWr corpus, is discussed. This study utilizes existing tools for Polish coreference resolution as sources of partial coreferential clusters containing pronoun, noun and named entity mentions. They are also used as baseline zero coreference resolution systems for comparison with our system. The evaluation process is focused not only on clustering correctness, without taking into account types of mentions, using standard CoNLL-2012 measures, but also on the informativeness of the resulting relations. According to the annotation approach used for coreference to the KPWr corpus, only named entities are treated as mentions that are informative enough to constitute a link to real world objects. Consequently, we provide an evaluation of informativeness based on found links between zero anaphoras and named entities. For the same reason, we restrict coreference resolution in this study to mention clusters built around named entities.
More...
In the beginning World Wide Web was syntactic and the content itself was only readable by humans. The modern web combines existing web technologies with knowledge representation formalisms. In this sense, the Semantic Web proposes the mark-up of content on the web using formal ontology that structure essential data for the purpose of comprehensive machine understanding. On the syntactical level, standardization is an important topic. Many standards which can be used to integrate different information sources have evolved. Beside the classical database interfaces like ODBC, web-oriented standard languages like HTML, XML, RDF and OWL increase in importance. As the World Wide Web offers the greatest potential for sharing information, we will base our paper on these evolving standards.
More...
In this study, natural language processing (NLP) is used to analyse nominal inflection in Estonian proficiency examination writings representing the CEFR levels A2–C1. The aim is to define the nominal features that distinguish learner language production at each proficiency level. For this purpose, the frequency and variation of inflectional forms are measured in two ways: a) for the nominal parts of speech (PoSs) in total, i.e., considering the use of nouns, pronouns, adjectives and numerals; b) for nouns, pronouns and adjectives individually (numerals were discarded due to low frequency). The analysed corpus contains 480 texts, 120 for each level. Nominal features based on the grammatical categories of number, case and degree of comparison are extracted from the morphologically tagged and manually corrected output of the Stanza NLP toolkit. Relevant features are selected according to the following criteria: they correlate with the proficiency level, their values change monotonically, and there are statistically significant differences between (some) adjacent levels. A2–C1-level texts are consistently distinguished by the number of cases used in the text as well as the ratio of singular and plural forms. The changes in the frequency of nominal inflectional forms mainly occur from level B1 to C1. The use of translative, nominative and genitive case are more strongly related to the text level, while partitive, inessive, elative and comitative case and comparative adjectives also differentiate some levels. Furthermore, the study indicates that it is beneficial to observe inflection-based features separately for each PoS when analysing L2 development. Firstly, the PoSspecific frequencies of some grammatical categories increase at different stages of proficiency. Secondly, changes may emerge for certain PoSs only. The identified criterial features could be used for automated assessment of Estonian L2 writings alongside lexical, syntactic and other linguistic features. The results can also help to specify the CEFR level descriptions for Estonian.
More...
This paper addresses the poorly understood patterning in the presence vs. absence of the accusative resumptive pronoun in the Czech relative clauses (RC) introduced by the absolutive relativizer co. Using both qualitative and frequency-based quantitative ana-lysis, I investigate the distribution of the resumptive pronoun in authentic usage as at-tested in the Czech National Corpus. The study leads to the conclusion that the criteria that determine the distribution of the accusative resumptive pronoun go well beyond the traditionally invoked need for expressing agreement categories (gender, number) and grammatical relations (accusative object) or that the presence vs. absence of the pronoun should depend exclusively on the animacy of the relativized noun. Instead, the distribution appears to depend on the semantic compatibility between the relativized noun and the proposition expressed by the RC, reflecting a functional distinction be-tween a determinative and non-determinative (explicative) interpretation of the RC; the former is unambiguously signaled by the bare relativizer co, the latter is available with the analytic co + resumptive pronounACC pattern as one of the interpretive options.
More...
Intercultural communicative competence (ICC) is an indispensable skill when interacting with people from other cultures, given the clash of perspectives that intercultural encounters may bring about. Being a skill that can be taught and learned, there is a wide concern for developing ICC through formal education. This involves designing specific training tasks that can enhance the acquisition of ICC with the help of virtual exchange (VE) activities.The aim of the present paper is to highlight a specific way in which the educational goals associated with ICC development can be achieved. To this end, an analysis of 55 eTwinning intercultural projects has been conducted in order to determine the relationship between ICC and VE.The statistical data described here indicate that VE fosters the development of ICC. Moreover, they are indicative of the fact that the VE task types that are most effective in the development of ICC can be identified through computation.
More...
In the third century one part of elite of the ancient Japanese society adopted Chinese writing and began to learn it. It is assumed that at the beginning Japanese read Chinese characters following the sound patterns of the ancient Japanese language approximating the Chinese sounds. However, Japanese sounds applied the Chinese characters, and at the same time the word order was changed into Japanese word order. This was the beginning of kanbun kundoku, or Chinese writing with Japanese readings. The term ‘Japanese readings’ is used here in the sense of both: to read each individual character as a Chinese character, or, to read them replacing the word order of Chinese writing into a Japanese translation. When Chinese characters were adopted for use in Japan, they were at first read as Chinese sounds with a Japanese pronunciation approximating that of the Chinese reading. Thereafter, this type of Japanese translation for individual readings of Chinese characters known as ‘kundoku’ began. ‘Kundoku’ (reading characters with their Chinese pronunciations) is still used today along with ‘ondoku’ for reading Chinese characters used in Japanese, i.e. in ‘kanbun kundoku’. This first reading is important in the history of modern Japanese translation. The reason is that when Japanese first encountered western languages, this method of Chinese translation readings was used for English translation, French translation, and so on. In short, Japanese people created another style of written Japanese for translation, dating back to Chinese writing system, apart from the traditional ancient Japanese language system. In Japan, however, after Chinese characters were introduced from China, Japanese created a style of native Japanese readings. Japanese translators have translated naturally according to their own logic and style.
More...
The article presents basic principles of designing the diachronic linguistic corpus of documents of the Don Cossack Host offices from the State Archive of the Volgograd region, Russia, including collecting documents for the text corpus, arranging the technical base of automatic processing and text editing, scheduling automated tagging, morphological annotation, and corpus software tools. The authors explain some technical aspects of corpus processing and text corpus constituency. It is considered reasonable to add any document to the corpus, the draft texts with the crossed-out fragments included, as it ensures accurate registration of grammar and vocabulary of the language at a certain historical period. A set of language marker types is worked over for automated meta-tagging. The corpus software tools are defined to enable accurate annotation of obsolete fonts so that they can be processed in a pair with regular language units and expressions in morphological and genre meta-tagging; in cases of partial text adaptation, the authentic old graphic symbols may have to be preserved.
More...
This paper reviews the advancement of using speech recognition (SR) technology in EFL/ESL classrooms in the last few decades, addresses researchers’ and educators' concerns about the limitation of this technology and examines how far SR technology has been evolving in its own field. Finally, potential pedagogical implications of SR technology for EFL/ESL, its limitations and suggestions for further studies are discussed.
More...
In this study we examine the occurrences and correspondences of terms for affinal kinship in a Bulgarian–Ukrainian parallel corpus of fiction. All instances of the terms selected for study, matching and non-matching, were located and counted, and the frequencies compared. Some of the asymmetries found may have roots in culture and history whilst others reflect diverse features of language and the practice of literary translation.
More...
Normalizing historical texts or in other words converting them to modern spelling enables us to analyze them with tools designed for contemporary language. It also makes it possible to search the texts for different keywords and automatically compare the old spelling to contemporary spelling. This article gives a general overview of normalizing, different methods, previously performed experiments and the main problems in the context of the old Estonian texts from the second half of the 19th century.
More...
This article deals with the best media or media adequate ways to memorize vocabulary. An empirical study is presented in which test persons had to memorize vocabulary in an unknown language in three different ways. Thus, three experimental groups were presented Hungarian vocabulary to be learnt. The first group learnt a vocabulary list from a sheet of paper, the second one from the computer monitor, but without any animation, and the third one from an animated flash file. In the present article, the results of this study are reported and discussed.
More...-
The paper presents the development, within a research project, of an interactive system of grammatical analysis for texts written in Romanian. The two products realised as practical applications are presented here: a grammar checker for Romanian and an educational application with functions of assistance in teaching/ learning Romanian (as a foreign language).
More...
Stylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications.
More...
The purpose of this paper is to provide an overview of the language policy in France in relation to French and the regional languages. We start the overview from the Renaissance period when the French national feeling began to form and the distinctiveness of the French nation started to manifest, leading to increased usage of the French language and gradual superseding of the regional languages. Taking into consideration the fact that after the French Revolution in 1789, the unity policy of the French nation intensifies and thus the directions of action in the languages of its territory change, we divided the overview of the language policy in France in two parts: before and after the Revolution. For the revolutionaries, the ignorance of the French language was an obstacle for the democracy and spreading the revolutionary ideas, thus extending the superseding of the regional languages throughout the 19th and early 20th century. After the World War II, the regional languages and cultures received more attention and they were regarded as a treasure that needed to be preserved and their disappearance to be prevented. According to the relations and the language activities undertaken by France in the contemporary period, we distinguish Language policy in relation to the French language and Language policy in relation to the regional languages.
More...
The jargon of informatics developed very quickly in the Romanian language especially in the last two decades. Computer influenced the speech of young people particularly in lexical aspect. The nouns and verbs borrowed from English and used in conversations on the chat also entered the everyday speech of the youth. Some of them even engender whole lexical groups. There are categories of words which are of interest from the point of view of morphology, semantics, spelling and orthoepy. Anglicisms are lexemes which cause problems in linguistic integration and adaptation.
More...