Isolation Forests for Symbolic Data as a Tool for Outlier Mining Cover Image

Lasy separujące dla danych symbolicznych jako narzędzie wykrywania obserwacji odstających
Isolation Forests for Symbolic Data as a Tool for Outlier Mining

Author(s): Marcin Pełka, Andrzej Dudek
Subject(s): Socio-Economic Research
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: ymbolic data analysis; isolation forest; outliers

Summary/Abstract: Aim: Outlier detection is a key part of every data analysis. Although there are many definitions of outliers that can be found in the literature, all of them emphasise that outliers are objects that are in some way different from other objects in the dataset. There are many different approaches that have been proposed, compared, and analysed for the case of classical data. However, there are only few studies that deal with the problem of outlier detection in symbolic data analysis. The paper aimed to propose how to adapt isolation forest for symbolic data cases. Methodology: An isolation forest for symbolic data is used to detect outliers in four different artificial datasets with a known cluster structure and a known number of outliers Results: The results show that the isolation forest for symbolic data is a fast and efficient tool for outlier mining. Implications and recommendations: As the isolation forest for symbolic data appears to be an efficient tool for outlier detection for artificial data, further studies should focus on real data sets that contain outliers (i.e. credit card fraud dataset), and this approach should be compared with other outlier mining tools (i.e. DBCSAN). The authors recommend using the same initial settings for the isolation forest for symbolic data as the settings that are proposed for the isolation forest for classical data. Originality/value: This paper is the first of its kind, focusing not only on the problem of outlier detection in general, but also extending the well-known isolation forest model for symbolic data cases.

  • Issue Year: 28/2024
  • Issue No: 1
  • Page Range: 1-10
  • Page Count: 10
  • Language: English
Toggle Accessibility Mode