Automated (Semantics Driven) Data Retrieval from Fiscal Documents: A Comprehensive Approach Cover Image

Automated (Semantics Driven) Data Retrieval from Fiscal Documents: A Comprehensive Approach
Automated (Semantics Driven) Data Retrieval from Fiscal Documents: A Comprehensive Approach

Author(s): Vasile Minea, Cornel Stan, Gheorghe-Dragoș Florescu, Costin Lianu, Cosmin Lianu
Subject(s): Electronic information storage and retrieval, Semantics, Computational linguistics, ICT Information and Communications Technologies
Published by: Editura Fundaţiei România de Mâine
Keywords: automated data retrieval; semantics-driven; fiscal documents; comprehensive approach; data extraction; semantic technology; information retrieval; document analysis; machine learning; financial data;

Summary/Abstract: The importance of paper documents in regular business flow cannot be underestimated. They are an important part of the business domain increasingly digital landscape, complementing digital solutions by providing a plus of transparency, reliability and security. Making prompt decisions in the business world requires fast access to relevant and up-to-date data, and working with paper-based documents is very inefficient. Digitization of documents is ubiquitous, and digital document management systems (DMS) play an important role in fields like science, business or health. In the business domain, Enterprise Resource Planning (ERP) systems represent an entire ecosystem of solutions, meant to address every aspect of the business process, in a unified approach. An important aspect of successful ERP implementations is related to the integration of DMS into the ERP. Enabling automated retrieval of data from all kinds of fiscal paper documents into the ERP is the next logical step. In this paper, we provide a hands-on approach for the task of automated text retrieval from fiscal documents. The novelty of our work resides in the manner in which we addressed the semantics of the retrieved data, such that the system associates meaning to the retrieved text elements, at the same time easing the processing of future documents. The solution is presented in a generic form, with a thorough discussion of the technological aspects. It is further implemented in the ERP system. We present and discuss experimental results, finally drawing conclusions and providing several ideas to further develop our work.

  • Issue Year: 23/2023
  • Issue No: 4
  • Page Range: 327-342
  • Page Count: 16
  • Language: English