Optimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems

Bektas, Jale; Ibrikci, Turgay

Optimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems

Tarih

2022

Yazarlar

Bektas, Jale

Ibrikci, Turgay

Yayıncı

Inderscience Enterprises Ltd

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The phenomenon of how to treat missing values is a problem confronted in big data analysis. Therefore, various applications have been developed on imputation strategies. This study focused on four imputation frameworks proposing novel perspectives based on expectation-maximisation (EM), self-organising map (SOM), K-means and multilayer perceptron (MLP). Initially, several transformation steps such as normalised, standardised, interquartile range and wavelet were applied. Then, imputed datasets were analysed using decision tree algorithms (DTAs) by optimising their parameters. These analyses showed that DTAs had not been strikingly affected by any data transformation techniques except interquartile range. Even though the dataset contains a missing value ratio of 33.73%, the EM imputation framework provided a performance increase of 0.42% to 3.14%. DTAs based on C4.5 and NBTree algorithms have been more stable for all big imputed datasets. Furthermore, realistic performance measurement of any preprocessing experiment based on C4.5 can be proposed to avoid time complexity.

Anahtar Kelimeler

preprocessing, data mining, multiple imputation, decision tree classifier, machine learning, big data analytics

Kaynak

International Journal of Computational Science and Engineering

WoS Q Değeri

N/A

Scopus Q Değeri

Q2

Cilt

25

Sayı

5

Bağlantı

https://doi.org/10.1504/IJCSE.2022.10051198
https://hdl.handle.net/20.500.14669/2910

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Optimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon