Semantic Author Recommendations Based on their Biography from the General Romanian Dictionary of Literature Cover Image

Semantic Author Recommendations Based on their Biography from the General Romanian Dictionary of Literature
Semantic Author Recommendations Based on their Biography from the General Romanian Dictionary of Literature

Author(s): Laurențiu-Marian Neagu, Teodor-Mihai COTEȚ, Mihai Dascălu, Ştefan Trăuşan-Matu, Laura Bădescu, Eugen Simion
Subject(s): Social Sciences, Education, Higher Education
Published by: Carol I National Defence University Publishing House
Keywords: Clustering; Text Categorization; Text Mining; Analysis of General Romanian Dictionary of Literature; Author Recommendations; Adaptive Technologies;

Summary/Abstract: The General Romanian Dictionary of Literature is a centralized text repository which contains detailed biographies of all Romanian authors and can be used to perform various subsequent analyses. The aim of this paper is to introduce a novel method to recommend authors based on their biography from the General Romanian Dictionary of Literature (DGLR). Starting from multiple input files made available by the “G. Călinescu” Institute of Literary History and Theory, we extracted relevant information on Romanian authors covering the [A-D] letters which was indexed into Elasticsearch, a non-relational database optimized for full-text indexing and search. The relevant information considers author’s full name, their pseudonym (if any), years and places of birth and of death (if applicable), brief description (including studies, cities they lived in, important people they met, brief history), writings, critical references of others, etc. The indexed information is easily accessible through a RESTful API and provides a powerful starting point which may contribute to future Romanian cultural findings. Our aim is to create an interactive map showing all Romanian literature contributors by enabling the identification of similarities and differences between them based on specific features (e.g., similar writing styles, time periods, or similar text descriptions in terms of semantic models). In order to have a clearer image on how authors relate one to another, we employed the kNN algorithm on a set of integrated features covering authors’ descriptions transposed in a reduced fastText embedding space, overlap of biographic refences and professions, as well as closeness in terms of publishing periods. This paper is a proof of concept that makes use of only the first two volumes of DGLR and represents the first step for follow-up analyses performed using the indexed dictionary.

  • Issue Year: 15/2019
  • Issue No: 01
  • Page Range: 165-172
  • Page Count: 8
  • Language: English