Language Technology Resources And Tools For Mansi Cover Image

Language Technology Resources and Tools for Mansi
Language Technology Resources And Tools For Mansi

Author(s): Csilla Horváth, Ágoston Nagy, Norbert Szilágyi, Veronika Vincze
Subject(s): Language and Literature Studies
Published by: Scientia
Keywords: Mansi;Finno-Ugric;
Summary/Abstract: In our paper, we offer an overview of language technology tools and resources (being) developed for an endangered minority language, Mansi, a Finno-Ugric language spoken in Western Siberia, Russia. We pay special attention to lexical resources and morphological analysers, and we also briefly present our efforts to contribute to the field of Mansi language technology. Instead of starting from the larger dictionaries compiled by European researchers, we decided to begin our work with the smaller but more actively used dictionaries published by Mansi researchers. The beta version of the online Mansi dictionary now contains approximately 15,000 entries, while another lexical resource, a wordnet is also being constructed for Mansi. We chose to create a new morphological analyser for Mansi from scratch. From among the many currently available finite-state transducers, the HFST standard was chosen in order that the analyser could be integrated into the framework which is used at the GiellaTekno website. In order to test our morphological tools, we have started to create a Mansi corpus which consists of the articles published in the Mansi newspaper Luima Seripos from 2013.

  • Page Range: 199-208
  • Page Count: 10
  • Publication Year: 2019
  • Language: English