Parallel Corpora in Serbia – Possibilities for Simultaneous Information Retrieval In Two or More Languages Cover Image

Паралелни корпуси у Србији – могућности за паралелно проналажење информација на два или више језика
Parallel Corpora in Serbia – Possibilities for Simultaneous Information Retrieval In Two or More Languages

Author(s): Jelena Andonovski
Subject(s): Library and Information Science, Information Architecture, Electronic information storage and retrieval
Published by: Библиотекарско друштво Србије
Keywords: corpus linguistics; language corpora; parallel corpora; natural language processing; information retrieval

Summary/Abstract: Aligned multilingual corpora have become essential resources in multilingual Natural Language Processing (NLP) in the last decades, as well as one of the major resources for researchers in various areas of linguistics and related language disciplines. Parallel corpora are language corpora that contain a collection of one or more original texts in one language and their translations into one or more other languages. Original texts and their translations are aligned at some level of text divisions (e.g. sentence, paragraph, and chapter level). In most cases, parallel corpora contain texts in only two languages but also there are examples of one-language parallel corpora containing a collection of different editions of the same text in one language. In Serbia, JeRTeh, Language Resources and Technologies Society (former Group for Language Technologies) has been developing parallel corpora containing Serbian texts for decades.

  • Issue Year: LXIII/2021
  • Issue No: 1
  • Page Range: 51-74
  • Page Count: 24
  • Language: Serbian