CEEOL - Browse Subjects Result

We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.

The Use of Lexical Borrowings and their Lithuanian Equivalents in the Computer-Mediated Environment in Students’ Speech

Author(s): Auksė Marmienė / Language(s): English Issue: 2/2015

The paper aims to establish the differences between lexical borrowings and their Lithuanian equivalents, to introduce the concept of borrowing, what has been achieved in this field and to examine the challenges the learners face in using specific terminology in the computer-mediated environment. Different types of borrowings have been analysed according to the degree of their assimilation. The rate of occurrence of borrowings in students’ speech has been examined as well as the reasons for choosing borrowings rather than native words. The factors determining the degree of borrowings’ recognition are age, knowledge of foreign languages and the degree of assimilation of borrowings.

More...

Application of New Information and Computer Technologies in the Teaching of Foreign Languages

Author(s): Irena Miculevičienė / Language(s): English Issue: 1/2016

The paper describes the up-to-date methods of teaching or learning foreign languages, which are conditioned by the integration of information technologies in the teaching process. Internet gives a perfect opportunity to enliven the lectures of foreign languages, to improve their quality and effectiveness. The main advantage of internet is its authentic material in an authentic context. The examples of such authentic material can be Web 2.0 (website of second generation), Wiki, blogs and podcasts. The second advantage of internet is that it helps the learners to cooperate together. The teachers can lead the learners more effectively, consult each learner individually. It diversifies the teaching process, encourages the activity of students, who have learning difficulties.

More...

Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga

Author(s): Dage Särg / Language(s): Estonian Issue: 12/2016

The paper provides an overview of an attempt to adapt the Estonian Constraint Grammar rule set for netspeak. The rule set has been developed by Kaili Müürisep and Tiina Puolakainen for shallow and dependency parsing of standard written Estonian, and it has previously been adapted for shallow parsing of spoken Estonian by Kaili Müürisep and Heli Uibo. First, in order to adapt the rules, a chatroom corpus was parsed with the existing rule set. The corpus was manually revised and based on the errors that were found, changes were made to the rule set. The changes regarded detection of clause boundaries and particle verbs, as well as assignment of syntactic tags and dependency relations. Extensive use of discourse particles and direct addresses, short sentence length, and small percentage of attributes among the syntactic functions used in text appeared to be the most distinctive features of netspeak, as well as the large amount of elliptical sentences from which, in addition to other syntactic functions, a predicate can be left out. As a result of adapting the rule set, the results of both shallow and dependency parsing improved. The most error-prone syntactic functions were subjects, predicatives, and adverbials. In dependency parsing, the largest number of errors was made in determining the governors of adverbials.

More...

Извлечение коллокаций из корпуса украинских текстов

Author(s): Tatyana Bobkova / Language(s): Russian Issue: 27/2015

В статье описывается методика извлечения двусловных коллокаций из корпуса украинских законодательных текстов. Существующие методики выделения коллокаций основываются на подходах, отличающихся критериями идентификации и последовательностью применяемых процедур. В работе обосновывается необходимость использования корпусно-ориентированного подхода, основанного на идентификации коллокации как статистически значимой единицы и применении корпусных методов обработки текстов. Коллокация определяется как неслучайное сочетание двух слов, регулярно встречающихся вместе, и характерное как для текстов определенного функционального стиля, так и для языка в целом. Разработанная методика идентификации двусловных коллокаций, позволяет на основе статистической обработки и использования программ лемматизации автоматически извлекать устойчивые двухсловные сочетания из подкорпуса украинских текстов. Результаты извлечения нуждаются в последующем редактировании с целью снятия омонимии и определения грамматически правильных коллокаций. Повышение эффективности результатов автоматического формирования списка обеспечит применение большего по объему корпуса текстов и лингвистических фильтров идентификации коллокаций.

More...

О стыде и связанных с ним понятиях в фокусе Национального корпуса русского языка

Author(s): Andrey Evgenyevich Bochkarev / Language(s): Russian Issue: 2/2016

The article explores the ways of displaying shame in the National Corpus of Russian language. The certified use of words allows to identify the most typical contexts, as well as situations and feelings which correlate with them, and determine meanings that shame gets in the Russian language image of the world.

More...

Bare Quantifier Fronting as Contrastive Topicalization

Author(s): Ion Giurgea / Language(s): English Issue: 2/2015

I argue that indefinite (in particular bare quantifiers such as ‘something’, ‘somebody’, etc.) which are neither existentially presupposed nor in the restriction of a quantifier over situations, can undergo topicalization in a number of Romance languages (Catalan, Italian, Romanian, Spanish), but only if the sentence contains “verum” focus, i.e. focus on a high degree of certainty of the sentence. I analyze these indefinite as contrastive topics, using Büring’s (1999) theory (where the term ‘S-topic’ is used for what I call ‘contrastive topic’). I propose that the topic is evaluated in relation to a scalar set including generalized quantifiers such as {P x P(x), P MANYx P(x), P MOSTx P(x), P xP(x)} or {P xP(x), P P(a), P P(b) ...}, and that the contrastive topic is the weakest generalized quantifier in this set. The verum focus, which is part of the “comment” that co-occurs with the “Topic”, introduces a set of alternatives including degrees of certainty of the assertion. The speaker asserts that his claim is certainly true or highly probable, contrasting it with stronger claims for which the degree of probability is unknown. This explains the observation that in downward entailing contexts, the fronted quantified DPs are headed by ‘all’ or ‘many’, whereas ‘some’, small numbers or ‘at least n’ appear in upward entailing contexts. Unlike other cases of non-specific topics, which are property topics, these are quantifier topics: the topic part is a generalized quantifier, the comment is a property of generalized quantifiers. This explains the narrow scope of the fronted quantified DP.

More...

Internetiniai sporto ir politikos straipsnių komentuotojų slapyvardžiai

Author(s): Judita Džežulskienė / Language(s): Lithuanian Issue: 84/2011

The internet space is a very specific electronic medium with new features favourable for the functioning of personal names. An internet nickname, or a pseudonym, is a false virtual personal name, assumed by a recipient himself/herself rather than given by somebody else (parents, other people). By assuming a nickname, a person wants to be identified, attract other people’s attention rather than just introduce himself/herself.The choice of a virtual nickname is determined by a number of non-linguistic factors. Previously Lithuanian nicknames were mostly used in literature or for political reasons by figures of Lithuanian national movement. The nicknames were mostly used to disguise the actual personal data of the author: gender, age and profession. Nowadays internet nicknames do not only disguise the recipient’s identity, but often also create a new identity. To ensure successful communication, the new identity has features most relevant to the author.Nowadays the genre of comments, fairly popular in the internet discourse, emerges as a result of opportunities of communication offered by internet publications. The paper analyses the names assumed by the authors of internet comments who reply and react to articles published in the electronic space. The data has been collected from www.lrytas.lt, a site of political and sports news.The research results have demonstrated that part of the collected data could be treated as true nicknames, whereas the others are their functional equivalents, replacing the title, beginning of the comment or the addressee’s reference. The overview of true nicknames leads to a conclusion that the commentators make ample use of personal names, nationality, place of origin or living; less frequently they indicate their social status, professional or other affiliation, refer to a characteristic feature. To ensure their attractiveness or successful pragmatic effect,they often violate the language norm applicable to the structure and spelling of language units.

More...

A preliminary study in zero anaphora coreference resolution for Polish

Author(s): Adam Jan Kaczmarek,Michał Marcińczuk / Language(s): English Issue: 17/2017

Zero anaphora is an element of the coreference resolution task that has not yet been directly addressed in Polish and, in most studies, it has been left as the most challenging aspect for further investigation. This article presents an initial study of this problem. The preparation of a machine learning approach, alongside engineering features based on linguistic study of the KPWr corpus, is discussed. This study utilizes existing tools for Polish coreference resolution as sources of partial coreferential clusters containing pronoun, noun and named entity mentions. They are also used as baseline zero coreference resolution systems for comparison with our system. The evaluation process is focused not only on clustering correctness, without taking into account types of mentions, using standard CoNLL-2012 measures, but also on the informativeness of the resulting relations. According to the annotation approach used for coreference to the KPWr corpus, only named entities are treated as mentions that are informative enough to constitute a link to real world objects. Consequently, we provide an evaluation of informativeness based on found links between zero anaphoras and named entities. For the same reason, we restrict coreference resolution in this study to mention clusters built around named entities.

More...

Corpus Linguistics and the Lexicon

Author(s): Barbara Lewandowska-Tomaszczyk / Language(s): English Issue: 36/1997

Autorka analizuje miejsce i funkcje korpusów językowych w analizie leksykograficznej języka oraz w jej zastosowaniach leksykograficznych. Badana problematyka dotyczy akwizycji wiedzy leksykalnej z lingwistycznych danych korpusowych, wielokrotnego używania tej wiedzy w zadaniach leksykografii jedno- i wielojęzycznej oraz możliwych implikacji takich metodologii w analizie słownictwa języka naturalnego. W pracy poruszono zagadnienia automatycznej analizy językowych danych korpusowych i zaprezentowano ich przykłady na materiale języka angielskiego.

More...

Korpusový výzkum mluveného jazyka na příkladu češtiny a angličtiny: současný stav

Author(s): Anna Čermáková,Marie Kopřivová / Language(s): Czech Issue: 3/2018

The article aims to review corpus-based research on spoken language, emphasizing issues in description and conceptualization of the grammar of spoken language in relation to the grammar of written language. The review first briefly looks at the development of spoken corpora, from simply transcribed corpora without sound alignment to today’s sophisticated multi-modal corpora. The main part of the article deals with issues concerning the metalanguage for the description of spoken language, the choice of its basic descriptive unit, the status of basic linguistic categories such as part-of-speech, and typical lexical and grammatical devices. The existing extensive research on spoken English is reviewed and in line with it, illustrative examples based on Czech spoken corpora are provided. These are further contrasted with examples from written data to enhance the inherent differences between spoken and written language and the need to adjust the metalanguage of the description.

More...

Variabilita češtiny: multidimenzionální analýza

Author(s): Václav Cvrček,Zuzana Komrsková,David Lukeš,Petra Poukarová,Anna Řehořková,Adrian Jan Zasina / Language(s): Czech Issue: 4/2018

The article summarizes the theoretical foundations and results of a corpus-driven study of register variability in contemporary Czech. The descriptive framework is based on the methodology of multidimensional analysis, as previously applied to various other languages (see Biber 1995). The starting point is a quantitative analysis of a custom-built genre-diversified corpus in which linguistic features have been identified that are likely to be related to functional and systematic variability on different linguistic levels. Statistical processing using factor analysis then yields a model which identifies (in the case of Czech) 8 dimensions of variation of the texts. The greatest proportion of variance is explained by the first two dimensions, which can be described as dichotomies distinguishing between dynamic vs. static and spontaneous vs. prepared.

More...

Diachronní korpusová lingvistika a španělština: současný stav a problémy

Author(s): Zuzana Krinková / Language(s): Czech Issue: 1/2018

The first aim of the article is to address major problems of current historical corpus linguistics such as representativeness in genre, place and time, transcription of historical texts, etc. The second goal is to introduce the reader to traditional and innovative historical corpora of Spanish, focusing on their characteristics, advantages and limitations.

More...

Hluboké učení v automatické analýze českého textu

Author(s): Jana Straková,Milan Straka,Jan Hajič,Martin Popel / Language(s): Czech Issue: 4/2019

The deep learning methods of artificial neural networks have seen a significant uptake in recent years, and have succeeded in overcoming and advancing the success of auto-solving tasks in many fields. The field of computational linguistics and its application offshoot, natural language processing, with classic tasks such as morphological tagging, dependency analysis, named entity recognition and machine translation, are no exception to this. This paper provides an overview of recent advances in these tasks related to the Czech language and presents completely new results in the areas of morphological marking and recognition of named entities in Czech, along with a detailed error analysis.

More...

Problematique de la terminologie de l'internet en agni

Author(s): Assouan Pierre Andredou,Laurent Ehire / Language(s): French Issue: 2/2019

The concept of ICT has become the central theme that attracts the attention of researchers from various disciplines. All agree that ICTs are now essential to the development in Africa. Among these ICTs, the Internet appears to be the technology that conveys the most hope. The progress of the Internet has led to the emergence of a new society: the information society. The development of this society implies for all African people an agenda of linguistic adjustment. Indeed, African languages, through their exogenous contributions, lead to adjusting endogenous efforts towards sustainable development. All in all, in this current context marked by the intensive use of mother tongues in many activities, the contribution of the Internet cannot ignore these languages. However, it should be recognized that Ivorian languages in general and Agniin particular, can only claim the status of mediums for the use of the Internet if they are instrumentalized, i. e. modernized with the contribution of terminology capable of naming this computer tool. This analysis aims to explore the ability of the Agni language to identify new realities in the Internet domain.

More...

Le système littéraire de la Romania européenne : une analyse métacritique au prisme de la criticométrie

Author(s): Carolina Ferrer / Language(s): French Issue: 1/2020

Conceptually, this research arises at the intersection of systems theory, scientometrics, and literary studies. From the methodological viewpoint, we introduce criticometrics, an innovative approach that aims at empirically studying national and continental literatures, as well as the relations between them. Thus, through the exploitation of bibliographic databases, particularly the Modern Language Association International Bibliography, we compile and analyze the metadata of the critical publications about the literatures that constitute the European Romania. This sample, that cumulates over 430,000 references from 1884 to 2016, allowed us to map these national literatures and to elaborate chronological, geopolitical, and linguistic indicators. Then, we studied the critical bibliography about the main figures of these national literatures, in order to identify their level of interaction. All these observations, that have become available as a result of the emergence of the digital era, allow us to better understand the interferences and contrasts that characterize the literary system of the European Romania.

More...

Despre corpusurile electronice românești. Inimă: câteva e-ocurențe sintagmatice

Author(s): Daniela Gheltofan / Language(s): Romanian Issue: 1/2020

The aim of this paper is to present some of the electronic corpora of Romanian language, especially since, over the past decades, corpora have increasingly been used in linguistics studies. Many scholars from different research branches acknowledge the value of lingual national e-corpus. Using it as a tool in research, this allows for easy queries and obtains interesting results in linguistics behaviour. However, the use of Romanian electronic corpora in the academic field is very recent; because the corpora have been created less than two decades ago. For example, Contemporary Romanian Language corpus (CoRoLa) was initiated in 2014 (cf. Tufiș 2018). In this paper, we provide some examples to illustrate the application of Romanian electronic corpus (CoRoLa) with the keyword “inimă” (heart).

More...

Homonymie mezi oikonymy a antroponymy zakončenými na -slav/-slava jako problém automatické morfologické analýzy

Author(s): Klára Osolsobě,Hana Žižková / Language(s): Czech Issue: 2/2021

Homonymy at all levels, which is a distinct feature of all natural languages, is alsoone of the most significant obstacles to automatic natural language processing.In this paper, we will point out the morphosyntactic differences of Czech anthroponymsending in -slav ( Miroslav-type, masculine) and Czech oikonyms with thesame ending (Miroslav-type, feminine) and Czech anthroponyms ending in -slava(Miroslava-type feminine, because its forms are homonymous with both: masculineanthroponyms and feminine oikonyms). The analysis of data from the Syn v8 corpusshows that word form homonymy significantly influences the results of automaticmorphological analysis. We will document errors in the coverage of the automaticanalyzer dictionary and, above all, errors in morphological tagging, and we will proposea solution to partially improve the automatic disambiguation of the given typeof proper nouns.

More...

Změny v morfologické anotaci korpusů řady SYN: nové možnosti zkoumání české gramatiky a lexikonu

Author(s): Jan Křivan,Jana Šindlerová / Language(s): Czech Issue: 2/2022

This paper introduces some major conceptual enhancements to the morphological annotation of the SYN series corpora of the Czech National Corpus. Apart from minor changes in tokenization and in the positional tagset, three major conceptual changes have been applied which affect the representation of various lexical and grammatical patterns. In the paper, we present the actual impact of the changes in linguistic data and search for possibilities in three linguistic areas. First, the treatment of phonic, graphemic, and morphological variants via a two-tier lemma structure is discussed; second, a new approach to periphrastic verb forms, auxiliaries, participles and the interpretation of verbal grammatical categories through a new attribute, called verbtag, is explained; and third, a complex multi-value treatment of multiword tokens is introduced.

More...

58. konference Leibnizova ústavu pro německý jazyk

Author(s): Martin Šemelík,Marie Vachková / Language(s): Czech Issue: 3/2022

This text is a report from the international conference "Corpora in German Linguistics: Oral, Written and Multimedia", organized by the Leibniz Institute for the German Language and held online on March 15–17, 2022.

More...

Distribution of Terms Across Genres in the Annotated Lithuanian Cybersecurity Corpus

Author(s): Sigita Rackevičienė,Andrius Utka,Agnė Bielinskienė,Aivaras Rokas / Language(s): English Issue: 41(46)/2022

The paper provides results of the frequential distribution analysis of cybersecurity terms used in the Lithuanian cybersecurity corpus composed of texts of different genres. The research focuses on the following aspects: overall distribution of cybersecurity terms (their density and diversity) across genres, distribution of English and English-Lithuanian terms and their usage patterns in Lithuanian sentences, and, finally, the most frequent cybersecurity terms and their thematic groups in each genre. The research was performed in several stages: compilation of a cybersecurity corpus and its subdivision into genre-specific subcorpora, manual annotation of cybersecurity terms, automatic lemmatisation of annotated terms and, finally, quantitative analysis of the distribution of the terms across the subcorpora. The results reveal the similarities and differences of the use of cybersecurity terminology across genres which are important to consider to get a complete picture of terminology usage trends in this domain.

More...