Text Vectorization Techniques Based on Wordnet Cover Image

Text Vectorization Techniques Based on Wordnet
Text Vectorization Techniques Based on Wordnet

Author(s): Dávid Držík, Kirsten Šteflovič
Subject(s): Language and Literature Studies, Theoretical Linguistics, Applied Linguistics, Lexis, Semantics
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: word embedding; Word2Vec; Glove; synsets; text data augmentation; semantic similarity

Summary/Abstract: The utilization of text vectorization techniques has become essential for numerous classification tasks in present-day natural language processing. Word embedding methods commonly used today, such as Word2Vec, GloVe, etc., are based on the semantic similarity of words. WordNet, as a lexical database of words, provides a rich source of semantic information. In our article, we propose a text vectorization technique using extended text data with the data augmentation method, specifically by replacing words with their synonyms obtained from WordNet. The results obtained from text classification tasks using multiple classifiers demonstrate that expanding the corpus with this method leads to improved vector representations of words.

  • Issue Year: 74/2023
  • Issue No: 1
  • Page Range: 310-322
  • Page Count: 13
  • Language: English