The use of data mining models in solving the problem of imbalanced classes based on the example of an online marketing campaign Cover Image

The use of data mining models in solving the problem of imbalanced classes based on the example of an online marketing campaign
The use of data mining models in solving the problem of imbalanced classes based on the example of an online marketing campaign

Author(s): Mariusz Łapczyński, Jerzy Surma
Subject(s): Economy
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: C&RT; Random Forest; imbalanced class problem; online social network; banner ad campaign

Summary/Abstract: While building predictive models in analytical CRM, researchers often encounter the problem of imbalanced classes (skewed distributions of dependent variables), which consists in the fact that the number of observations belonging to one category of the dependent variable is much lower than the number of observations belonging to the second category of that variable. This is related to such areas as churn analysis, customer acquisition models and cross and up-selling models. The purpose of the paper is to present a predictive model that was built to predict the response of Internet users to banner advertising. The dataset used in the study came from an online social network which offers advertisers banner campaigns targeting its users. The advertising campaign of a cosmetics company was carried out in the autumn of 2010 and was mainly targeted at young women. A user of this service was described by 115 independent variables – 3 out of which were demographic variables (sex, age, education), and the remaining 112 referred to the user’s online activity. While building the model there appeared the problem of imbalanced classes due to the low number of users who clicked on the banner ad. The number of cases amounted to 81,000, while the number of positive reactions to the banner was 207, which constitutes approximately 0.25% of the dependent variable. During the study, two popular data mining tools were utilized – the decision trees C&RT and Random Forest. The second goal of this paper is to compare the performance of the predictive models based on both these analytical tools.

  • Issue Year: 2015
  • Issue No: 49
  • Page Range: 9-19
  • Page Count: 11
  • Language: English