Light Verb Constructions in ELEXIS-WSD – Annotation, Comparisons and Issues
Light Verb Constructions in ELEXIS-WSD – Annotation, Comparisons and Issues
Author(s): Cvetana Krstev, Ranka Stanković, Aleksandra MarkovićSubject(s): Language and Literature Studies, Economy, Theoretical Linguistics, Applied Linguistics, Computational linguistics, ICT Information and Communications Technologies
Published by: Институт за български език „Проф. Любомир Андрейчин“, Българска академия на науките
Keywords: light verb constructions; annotation; ELEXIS-WSD; Serbian; Bulgarian; Slovene; English.
Summary/Abstract: This paper deals with light verb constructions and their annotation in ELEXIS-sr, the Serbian extension of the ELEXIS-WSD corpus. In Section 1, general introductory remarks are given about these constructions, the notion of light verbs, and their treatment and further classification in the PARSEME annotation guidelines (subtypes LVC.full and LVC.cause). Section 2 offers an insight into ELEXIS-WSD corpus, annotated with VMWEs for several languages, with a remark that these VMWEs were not further subcategorised into finer classes. For this paper, we classified them ourselves to facilitate comparisons of the LVCs annotated in ELEXIS-sr. Tools and resources used for the automatic annotation of ELEXIS-sr are presented in Section 3, as well as the results of manual checking. In Section 4, we offer a comparison of LVCs in four ELEXIS-WSD sub-collections: Serbian, Bulgarian, Slovene, and English. We use Serbian as a starting point for this comparison, as it has been thoroughly annotated with MWEs (and NEs). We present the results of the comparison of all the occurrences of LVCs in the Serbian extension with their occurrences and annotation both in ELEXIS-WSD and Parseme sub-corpora for other languages. An important conclusion is that the most equivalents among LVCs are between Serbian and Bulgarian, closely related Slavic languages (a total of 34 equivalents), while between Serbian and Slovene, also Slavic, there are 11 equivalents, as between Serbian and English. It seems that this could be explained by the number of VMWES and LVCs annotated, or by the strategy used by different annotators.
Journal: Computational Linguistics in Bulgaria
- Issue Year: 1/2025
- Issue No: 1
- Page Range: 42-60
- Page Count: 19
- Language: English
