Smote vs. Random Undersampling for Imbalanced Data - Car Ownership Demand Model

Smote vs. Random Undersampling for Imbalanced Data - Car Ownership Demand Model
Smote vs. Random Undersampling for Imbalanced Data - Car Ownership Demand Model

Author(s): Wuttikrai Chaipanha, Patiphan Kaewwichian
Subject(s): Methodology and research technology, Transport / Logistics
Published by: Žilinská univerzita v Žilině
Keywords: tour-based model; multiclass classification; k-nearest neighbors; activity-based model;

Summary/Abstract: Because the numbers of cars reflect each person's travel behaviors for each specific location, the car ownership demand model plays a dominant role in analysis of the travel demand in order to understand each area's individual and household travel behaviors. However, the study project for the master plan of the Khon Kaen expressway represented imbalanced data; namely, the majority class and the minority class were not equal. Before developing a machine learning model, this study suggested a solution to balance the data by using oversampling and under-sampling techniques. The data, which had been improved with SMOTE (Synthetic Minority Oversampling Technique) and kNN (k-nearest neighbors) (k = 5), demonstrated a better effect than the other algorithms that were studied. The TPR (true positive rate) for the rural and suburban areas, which are types of regions with very different imbalance ratios, was calculated before balancing the data at 46.9 % and 46.4 %. As a result, the TPR values were 63.5 % and 54.4 %, respectively, following the data balancing.

Details
Contents

Journal: Komunikácie - vedecké listy Žilinskej univerzity v Žiline

Issue Year: 24/2022
Issue No: 3
Page Range: 105-115
Page Count: 11
Language: English

Content File-PDF

Back to list