AN ALGORITHM FOR DATA EXTRACTION FROM WEB PAGES BASED ON DATA SIMILARITIES Cover Image

DUOMENŲ IŠRINKIMO INTERNETO PUSLAPIUOSE ALGORITMAS, PAREMTAS DUOMENŲ TARPUSAVIO PANAŠUMU
AN ALGORITHM FOR DATA EXTRACTION FROM WEB PAGES BASED ON DATA SIMILARITIES

Author(s): Kiril Griazev, Simona Ramanauskaitė
Subject(s): Media studies, Information Architecture, ICT Information and Communications Technologies
Published by: Vilniaus Universiteto Leidykla
Keywords: data extraction; data parsing; data similarity;

Summary/Abstract: Problems with data extraction from web pages were analysed, a proposed solution is provided in the paper. Analysis showed that data-based algorithms are more popular than path-based data extraction. We propose a new data retrieval algorithm based on web page data similarity to controlled data. The efficiency of the proposed data retrieval algorithm was applied to the retrieval of currency exchange rates data, the efficiency of this algorithm prototype was evaluated by comparing it to other products. Research showed that the proposed data retrieval algorithm, although more suitable for the retrieval of constantly changing data and requires controlled data, is more efficient than other similar products.

  • Issue Year: 2017
  • Issue No: 1 (47)
  • Page Range: 73-79
  • Page Count: 7
  • Language: Lithuanian