Statistical methods of text classification Cover Image

Statystyczne metody klasyfikacji tekstów
Statistical methods of text classification

Author(s): Adam Idczak, Jerzy Korzeniewski
Subject(s): Economy
Published by: Wydawnictwo Uniwersytetu Łódzkiego
Keywords: sentiment of document; document classification; machine learning methods; linear correlation; SVM method; naive Bayes classifier
Summary/Abstract: In recent years, with the fast development of computer and Internet technologies, text-mining computer methods are becoming more and more important. Computer systems capacities can be further used in such areas as text summarization, information retrieval, text correcting, determining text subject, machine text translation, creating lexicons, determining text sentiment. This monograph is focused on sentiment analysis in the most popular meaning of this phrase i.e. on the sentiment of the whole document. The problems of binary classification (two document groups), staying away from external sources, using the training set but in the possibly smallest size, were emphasized. The monograph’s targets are: providing a comparative review of sentiment analysis methods to be found in literature, investigating the quality of selected methods of document sentiment classification in applications to Polish language written documents, proposing new methods which would upgrade the classification quality or possess other advantages. An original method with simple interpretation has been proposed which proved to be better than standard methods applied to classify English language documents, especially in the case of documents corpora with similar number of documents in both classes. The research was carried out on thirteen sets of documents from different independent sources.

  • E-ISBN-13: 978-83-8220-787-3
  • Print-ISBN-13: 978-83-8220-786-6
  • Page Count: 142
  • Publication Year: 2022
  • Language: Polish