How lemmatisation and derivational annotation affect productivity measures: The case of deverbal agent nouns in the Joint Corpus of Lithuanian
How lemmatisation and derivational annotation affect productivity measures: The case of deverbal agent nouns in the Joint Corpus of Lithuanian
Author(s): Jurgis Pakerys, Virginijus Dadurkevičius, Agnė Navickaitė-KlišauskienėSubject(s): Morphology, Syntax, Semantics, Baltic Languages, Philology
Published by: Latvijas Universitātes Akadēmiskais apgāds
Keywords: word formation; derivational productivity; agent nouns; Lithuanian;
Summary/Abstract: We discuss the automatic and manual stages of the lemmatisation and annotation of the Joint Corpus of Lithuanian (1.3 billion words) used to measure derivational productivity. As a case study, we present data of three productive deverbal agent noun suffixes in Lithuanian, -toj-, -ėj-, -ik-, and measure their realized, expanding, and potential productivity. We show that an additional semi-automatic lemmatisation and a manual derivational annotation significantly increase type and hapax counts. We also note that lemmatisation is affected by an artificially increased number of lemmas due to homographic forms unresolved by the lemmatiser. After the manual disambiguation of hapaxes, the numbers of feminine formations in -toj-(a) and -ėj-(a) were the most significantly reduced.
Journal: Valoda: nozīme un forma
- Issue Year: 2024
- Issue No: 15
- Page Range: 138-151
- Page Count: 14
- Language: English