Korpusy komputerowe język&#243;w słowiańskich

Korpusy komputerowe języków słowiańskich
Corpora of Slavic languages

Author(s): Beata Chachulska, Rafał L. Górski
Subject(s): Language studies
Published by: Instytut Slawistyki Polskiej Akademii Nauk

Summary/Abstract: The aim of this paper is a presentation of corpora of Slavic languages. A corpus for almost every Slavic language either was compiled or shall be finished very soon. Some languages can be studied with help of several corpora. To the knowledge of the authors the exceptions are: Bielorussian, Kashubian (if we agree that it is a language not a dialect) and Macedonian. The corpora are mostly accessable via Internet and meet the standards set by British National Corpus: their size ranges from 30 to 00 million running words, are balanced and morphosyntactically anotated. Interestingly, there is no interdependence between the position of a certain language and the quality of its corpus. Countries with relatively little population (e.g. Slovenia) can afford large and sophisticated corpora, while even if there are several corpora of Russian, none of them meets the standards which are nowadays required.

Details
Contents

Journal: Studia z Filologii Polskiej i Słowiańskiej

Issue Year: 2005
Issue No: 40
Page Range: 483-507
Page Count: 25
Language: Polish

Content File-PDF

Back to list

Korpusy komputerowe języków słowiańskich Corpora of Slavic languages

Korpusy komputerowe języków słowiańskich
Corpora of Slavic languages