Advanced Formatting of Delimited Big Data with Python Cover Image

Advanced Formatting of Delimited Big Data with Python
Advanced Formatting of Delimited Big Data with Python

Author(s): Ivan Drankov, Yassen Gorbounov
Subject(s): Social Sciences, Education, Library and Information Science, Information Architecture, Electronic information storage and retrieval, Education and training, Other, Higher Education
Published by: Нов български университет
Keywords: Big Data; Python; Text processing; XML; custom built; ease of use

Summary/Abstract: With the growing demand for big data applications, the requirements for data processing tools are constantly increasing. Most of the problems associated with big data are size, inaccurate data, irrelevant data fields, inconsistent formatting, duplicate data, etc., and most of the conventional tools are not designed for big workloads. Formatting large amounts of data can lead to lots of problems while processing it. Although this data is usually delimited, the large volume makes the use of visual software inapplicable to run on limited data. On the other hand, the use of regular expressions inherent to many programming languages such as Perl or tools like grep (global regular expression print) is characterized by certain complexity and requires in-depth knowledge, thus limiting the scope of their application. This article offers a method for working with large amounts of data, developed using the Python programming language. The considered method has very good flexibility allowing easy configuration via XML file and achieving a high degree of automation.

  • Issue Year: 16/2020
  • Issue No: 1
  • Page Range: 8-11
  • Page Count: 4
  • Language: English