The data-driven Bulgarian WordNet: BTBWN Cover Image

The data-driven Bulgarian WordNet: BTBWN
The data-driven Bulgarian WordNet: BTBWN

Author(s): Kiril Simov, Petya Osenova
Subject(s): Language studies, Media studies, Syntax, Lexis, Semantics
Published by: Instytut Slawistyki Polskiej Akademii Nauk
Keywords: Bulgarian WordNet; WordNet mappings; data-driven WordNet construction;

Summary/Abstract: The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both — syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval.

  • Issue Year: 2018
  • Issue No: 18
  • Page Range: 1-11
  • Page Count: 11
  • Language: English