INVESTIGATION OF THE ABILITIES OF DATA MINING SYSTEMS TO ANALYSE VARIOUS VOLUME DATASETS Cover Image

Duomenų tyrybos sistemų galimybių tyrimas įvairių apimčių duomenims analizuoti
INVESTIGATION OF THE ABILITIES OF DATA MINING SYSTEMS TO ANALYSE VARIOUS VOLUME DATASETS

Author(s): Kotryna Paulauskienė, Olga Kurasova
Subject(s): Cultural Essay, Political Essay, Societal Essay
Published by: Vilniaus Universiteto Leidykla

Summary/Abstract: The aim of the paper is to determine what volume of data the popular data mining systems are able to analyse within a reasonable period of time, when solving classification and clustering problems. Three open source data mining systems are investigated: WEKA, KNIME, and ORANGE. The experiments have been carried out with eight datasets, where the number of attributes was fixed – 100 and the number of instances ranged between 5000 and 600 000. The experimental investigation has shown that when the ORANGE system is used, the data of more than 50 000 instances are of too large volume. In order to analyse larger datasets, the WEKA and KNIME systems need to be used. The data of more than 200 000 instances are of too large volume for WEKA and KNIME, however, when simple classification methods are used, both systems are able to handle 400 000 instances, and KNIME – 600 000 instances. The results have showed that KNIME can handle larger datasets than WEKA, when applying some classification methods. The accuracy of classification is high enough, when the classification methods, implemented in the systems, are used.

  • Issue Year: 2013
  • Issue No: 65
  • Page Range: 85-95
  • Page Count: 11
  • Language: Lithuanian