Metrics for Assessing the Similarity in Text Documents Cover Image

Metrics for Assessing the Similarity in Text Documents
Metrics for Assessing the Similarity in Text Documents

Author(s): Martin Ivanov, Mariyana Raykova
Subject(s): Education, ICT Information and Communications Technologies
Published by: Нов български университет
Keywords: semantic analysis; similarity of text documents; text mining;

Summary/Abstract: : Finding the appropriate metrics for assessing the similarity of information content presented in unstructured texts is an essential element of many tasks classification, clustering and semantic analysis of text documents (grouped in the category Text Mining). Essentially, the choice of such metrics is a multivariate task and the decision depends on the initial conditions and the objectives of the research. In this research are presented, analyzed and classified the different possibilities for drawing up and using the metrics of text similarity. Considered are ways for their extension and improvement. The paper discusses the advantages and disadvantages of the currently known approaches for assessing the similarity between documents. Build-in techniques combined in their useful features are offered. Here is proposed a type of structure, which presents the concepts contained in the original text. The purpose of the study is to summarize the main opportunities for the development of metrics for assessing the similarity in text documents.

  • Issue Year: 12/2016
  • Issue No: 1
  • Page Range: 63-77
  • Page Count: 15
  • Language: English