Corpus-oriented lexicographic database for Beserman Udmurt

Corpus-oriented lexicographic database for Beserman Udmurt
Corpus-oriented lexicographic database for Beserman Udmurt

Author(s): Timofey Arkhangelskiy, Natalia Serdobolskaya, Maria Usacheva
Subject(s): Applied Linguistics, Lexis, Computational linguistics, ICT Information and Communications Technologies
Published by: Akadémiai Kiadó
Keywords: lexicography; Udmurt; Beserman; online dictionary; spoken corpus;

Summary/Abstract: Beserman Udmurt documentation project is a long-term undertaking aimed primarily at collecting lexicographic and corpus data in the field. During our work on the project, we developed a pipeline for collecting, annotating and publishing our data. In this paper, we describe this pipeline and present the online web interface we developed for providing public access to Beserman materials. We use TLex lexicographic software for working on the dictionary and Fieldworks FLEX for annotating the corpus. After the data have been annotated, they are exported to XML and stored in the online web interface, where these two types of data become interconnected and searchable. We propose solutions to challenges that arise in projects of such kind and reflect on various constraints imposed on lexicographic databases being developed in long-term projects aimed at description of underresourced languages. We suggest that the proposed pipeline and the web interface we developed could be employed by similar projects dealing with other minority languages. The web interface based on the database and a corpus of oral Beserman texts is available online at beserman.ru.

Details
Contents

Journal: Acta Linguistica Academica. An International Journal of Linguistics (Until 2016 Acta Linguistica Hungarica)

Issue Year: 64/2017
Issue No: 3
Page Range: 397-415
Page Count: 19
Language: English

Content File-PDF

Back to list