
4th International Conference on Corpus Linguistics (CILC2012) 22.–24. března 2012
http://www.cilc2012.es
More...We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
http://www.cilc2012.es
More...
This paper presents a digital edition of the manuscript of the first Russian translation of Leprince de Beaumont’s The Beauty and the Beast fairy tale (1756), aligned to its French original. The translation was made in 1758 by a twelve year-old girl, Khionia Demidova (1746-1792), and dedicated to her elder brother. Its original manuscript is conserved at the scientific library of Saratov State University (no. 456). This document is interesting from several points of view: the “naive” translation made by a young girl allows us to understand how the French literature was perceived in the 18th century Russia, what aspects of the French language and socio-cultural phenomena of the Western Europe were difficult to understand, and how the socio-cultural phenomena of the Western Europe were perceived. The peculiarities of Khionia’s spelling and punctuation provide data on her knowledge of Russian grammar and orthography. The digital edition includes a multi-layer transcription of the source document aligned with a digital fac-simile and the original French text. It is published online on the TXM-IHRIM web portal (https://txm-ihrim.huma-num.fr). The workflow of the edition Microsoft Word, Oxgarage and TXM may be reused for similar editions and text corpora.
More...
The greatly examined story of A Lost Lady usually depicts Mrs. Forrester’s success in meeting and adapting to the challenges of a changing world, a world characterized by materialism and self-fulfilment. However, the overlooked story, one far more disturbing than the privileged story in the text, is the narrative of oppressed groups of people of other races and the lower class. Drawing on some aspects of postcolonial theory, this paper explores Willa Cather’s own reactions to real changes in her society, to the waning power of imperialism, and of her nostalgic longing for the western prairies of her youth, without showing any sympathy for the dispossessed Native Americans and other oppressed races. It will also disclose the unmistakable colonial overtones, which remarkably resonate with the common discourse of “Manifest Destiny” during the time period of American expansion to the Wild West.
More...
In this paper we present a mixed-principle rule-based approach to the automatic syllabification of Serbian, based on prescriptive rules from traditional grammar in combination with the Sonority Sequencing Principle. We explore the problems and limitations of the existing rule set and sonority-based approaches, introduce an algorithm that utilizes both means in an attempt to produce a more accurate segmentation of words into syllables that is better aligned with the intuition of the native speakers, and present the statistical data related to the distribution of syllables and their structure in Serbian.
More...
Zero anaphora is an element of the coreference resolution task that has not yet been directly addressed in Polish and, in most studies, it has been left as the most challenging aspect for further investigation. This article presents an initial study of this problem. The preparation of a machine learning approach, alongside engineering features based on linguistic study of the KPWr corpus, is discussed. This study utilizes existing tools for Polish coreference resolution as sources of partial coreferential clusters containing pronoun, noun and named entity mentions. They are also used as baseline zero coreference resolution systems for comparison with our system. The evaluation process is focused not only on clustering correctness, without taking into account types of mentions, using standard CoNLL-2012 measures, but also on the informativeness of the resulting relations. According to the annotation approach used for coreference to the KPWr corpus, only named entities are treated as mentions that are informative enough to constitute a link to real world objects. Consequently, we provide an evaluation of informativeness based on found links between zero anaphoras and named entities. For the same reason, we restrict coreference resolution in this study to mention clusters built around named entities.
More...
In the beginning World Wide Web was syntactic and the content itself was only readable by humans. The modern web combines existing web technologies with knowledge representation formalisms. In this sense, the Semantic Web proposes the mark-up of content on the web using formal ontology that structure essential data for the purpose of comprehensive machine understanding. On the syntactical level, standardization is an important topic. Many standards which can be used to integrate different information sources have evolved. Beside the classical database interfaces like ODBC, web-oriented standard languages like HTML, XML, RDF and OWL increase in importance. As the World Wide Web offers the greatest potential for sharing information, we will base our paper on these evolving standards.
More...
This paper addresses the poorly understood patterning in the presence vs. absence of the accusative resumptive pronoun in the Czech relative clauses (RC) introduced by the absolutive relativizer co. Using both qualitative and frequency-based quantitative ana-lysis, I investigate the distribution of the resumptive pronoun in authentic usage as at-tested in the Czech National Corpus. The study leads to the conclusion that the criteria that determine the distribution of the accusative resumptive pronoun go well beyond the traditionally invoked need for expressing agreement categories (gender, number) and grammatical relations (accusative object) or that the presence vs. absence of the pronoun should depend exclusively on the animacy of the relativized noun. Instead, the distribution appears to depend on the semantic compatibility between the relativized noun and the proposition expressed by the RC, reflecting a functional distinction be-tween a determinative and non-determinative (explicative) interpretation of the RC; the former is unambiguously signaled by the bare relativizer co, the latter is available with the analytic co + resumptive pronounACC pattern as one of the interpretive options.
More...
Intercultural communicative competence (ICC) is an indispensable skill when interacting with people from other cultures, given the clash of perspectives that intercultural encounters may bring about. Being a skill that can be taught and learned, there is a wide concern for developing ICC through formal education. This involves designing specific training tasks that can enhance the acquisition of ICC with the help of virtual exchange (VE) activities.The aim of the present paper is to highlight a specific way in which the educational goals associated with ICC development can be achieved. To this end, an analysis of 55 eTwinning intercultural projects has been conducted in order to determine the relationship between ICC and VE.The statistical data described here indicate that VE fosters the development of ICC. Moreover, they are indicative of the fact that the VE task types that are most effective in the development of ICC can be identified through computation.
More...
In the third century one part of elite of the ancient Japanese society adopted Chinese writing and began to learn it. It is assumed that at the beginning Japanese read Chinese characters following the sound patterns of the ancient Japanese language approximating the Chinese sounds. However, Japanese sounds applied the Chinese characters, and at the same time the word order was changed into Japanese word order. This was the beginning of kanbun kundoku, or Chinese writing with Japanese readings. The term ‘Japanese readings’ is used here in the sense of both: to read each individual character as a Chinese character, or, to read them replacing the word order of Chinese writing into a Japanese translation. When Chinese characters were adopted for use in Japan, they were at first read as Chinese sounds with a Japanese pronunciation approximating that of the Chinese reading. Thereafter, this type of Japanese translation for individual readings of Chinese characters known as ‘kundoku’ began. ‘Kundoku’ (reading characters with their Chinese pronunciations) is still used today along with ‘ondoku’ for reading Chinese characters used in Japanese, i.e. in ‘kanbun kundoku’. This first reading is important in the history of modern Japanese translation. The reason is that when Japanese first encountered western languages, this method of Chinese translation readings was used for English translation, French translation, and so on. In short, Japanese people created another style of written Japanese for translation, dating back to Chinese writing system, apart from the traditional ancient Japanese language system. In Japan, however, after Chinese characters were introduced from China, Japanese created a style of native Japanese readings. Japanese translators have translated naturally according to their own logic and style.
More...
The article presents basic principles of designing the diachronic linguistic corpus of documents of the Don Cossack Host offices from the State Archive of the Volgograd region, Russia, including collecting documents for the text corpus, arranging the technical base of automatic processing and text editing, scheduling automated tagging, morphological annotation, and corpus software tools. The authors explain some technical aspects of corpus processing and text corpus constituency. It is considered reasonable to add any document to the corpus, the draft texts with the crossed-out fragments included, as it ensures accurate registration of grammar and vocabulary of the language at a certain historical period. A set of language marker types is worked over for automated meta-tagging. The corpus software tools are defined to enable accurate annotation of obsolete fonts so that they can be processed in a pair with regular language units and expressions in morphological and genre meta-tagging; in cases of partial text adaptation, the authentic old graphic symbols may have to be preserved.
More...
This paper reviews the advancement of using speech recognition (SR) technology in EFL/ESL classrooms in the last few decades, addresses researchers’ and educators' concerns about the limitation of this technology and examines how far SR technology has been evolving in its own field. Finally, potential pedagogical implications of SR technology for EFL/ESL, its limitations and suggestions for further studies are discussed.
More...
In this study we examine the occurrences and correspondences of terms for affinal kinship in a Bulgarian–Ukrainian parallel corpus of fiction. All instances of the terms selected for study, matching and non-matching, were located and counted, and the frequencies compared. Some of the asymmetries found may have roots in culture and history whilst others reflect diverse features of language and the practice of literary translation.
More...
Normalizing historical texts or in other words converting them to modern spelling enables us to analyze them with tools designed for contemporary language. It also makes it possible to search the texts for different keywords and automatically compare the old spelling to contemporary spelling. This article gives a general overview of normalizing, different methods, previously performed experiments and the main problems in the context of the old Estonian texts from the second half of the 19th century.
More...
This article deals with the best media or media adequate ways to memorize vocabulary. An empirical study is presented in which test persons had to memorize vocabulary in an unknown language in three different ways. Thus, three experimental groups were presented Hungarian vocabulary to be learnt. The first group learnt a vocabulary list from a sheet of paper, the second one from the computer monitor, but without any animation, and the third one from an animated flash file. In the present article, the results of this study are reported and discussed.
More...-
The paper presents the development, within a research project, of an interactive system of grammatical analysis for texts written in Romanian. The two products realised as practical applications are presented here: a grammar checker for Romanian and an educational application with functions of assistance in teaching/ learning Romanian (as a foreign language).
More...
Stylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications.
More...
The purpose of this paper is to provide an overview of the language policy in France in relation to French and the regional languages. We start the overview from the Renaissance period when the French national feeling began to form and the distinctiveness of the French nation started to manifest, leading to increased usage of the French language and gradual superseding of the regional languages. Taking into consideration the fact that after the French Revolution in 1789, the unity policy of the French nation intensifies and thus the directions of action in the languages of its territory change, we divided the overview of the language policy in France in two parts: before and after the Revolution. For the revolutionaries, the ignorance of the French language was an obstacle for the democracy and spreading the revolutionary ideas, thus extending the superseding of the regional languages throughout the 19th and early 20th century. After the World War II, the regional languages and cultures received more attention and they were regarded as a treasure that needed to be preserved and their disappearance to be prevented. According to the relations and the language activities undertaken by France in the contemporary period, we distinguish Language policy in relation to the French language and Language policy in relation to the regional languages.
More...
The jargon of informatics developed very quickly in the Romanian language especially in the last two decades. Computer influenced the speech of young people particularly in lexical aspect. The nouns and verbs borrowed from English and used in conversations on the chat also entered the everyday speech of the youth. Some of them even engender whole lexical groups. There are categories of words which are of interest from the point of view of morphology, semantics, spelling and orthoepy. Anglicisms are lexemes which cause problems in linguistic integration and adaptation.
More...
The problem of Lithuanization of Ancient names was and still is relevant and not fully resolved. It is manifested by varied spelling of names as well as by ongoing discussions and considerations of this matter in the media. Translated books are often subject to criticism: the translators and editors do not have the competences necessary to understand the patterns of ancient languages so that they could Lithuanize the names properly themselves, nor have they a unified reference. Development of the Digital Database of Ancient Names may be a solution to the problem. The tradition of spelling Latin and Greek ancient names is not stable or prevalent in Lithuanian writings. It is obvious from various cases of use in Lithuanian writings, collected in the Digital Database of Ancient Names.In fact, there are no grounds for speaking about traditions of Lithuanization, because such tradition was not formed or established. This is particularly true for transcription of proper names of Greek origin. We can only note certain tendencies and influences of other languages in different periods.
More...
The paper describes the up-to-date methods of teaching or learning foreign languages, which are conditioned by the integration of information technologies in the teaching process. Internet gives a perfect opportunity to enliven the lectures of foreign languages, to improve their quality and effectiveness. The main advantage of internet is its authentic material in an authentic context. The examples of such authentic material can be Web 2.0 (website of second generation), Wiki, blogs and podcasts. The second advantage of internet is that it helps the learners to cooperate together. The teachers can lead the learners more effectively, consult each learner individually. It diversifies the teaching process, encourages the activity of students, who have learning difficulties.
More...