Optimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems

dc.authoridIBRIKCI, Turgay/0000-0003-1321-2523
dc.authorid, Jale/0000-0002-8793-1486
dc.contributor.authorBektas, Jale
dc.contributor.authorIbrikci, Turgay
dc.date.accessioned2025-01-06T17:44:04Z
dc.date.available2025-01-06T17:44:04Z
dc.date.issued2022
dc.description.abstractThe phenomenon of how to treat missing values is a problem confronted in big data analysis. Therefore, various applications have been developed on imputation strategies. This study focused on four imputation frameworks proposing novel perspectives based on expectation-maximisation (EM), self-organising map (SOM), K-means and multilayer perceptron (MLP). Initially, several transformation steps such as normalised, standardised, interquartile range and wavelet were applied. Then, imputed datasets were analysed using decision tree algorithms (DTAs) by optimising their parameters. These analyses showed that DTAs had not been strikingly affected by any data transformation techniques except interquartile range. Even though the dataset contains a missing value ratio of 33.73%, the EM imputation framework provided a performance increase of 0.42% to 3.14%. DTAs based on C4.5 and NBTree algorithms have been more stable for all big imputed datasets. Furthermore, realistic performance measurement of any preprocessing experiment based on C4.5 can be proposed to avoid time complexity.
dc.identifier.doi10.1504/IJCSE.2022.10051198
dc.identifier.endpage531
dc.identifier.issn1742-7185
dc.identifier.issn1742-7193
dc.identifier.issue5
dc.identifier.scopus2-s2.0-85140631443
dc.identifier.scopusqualityQ2
dc.identifier.startpage523
dc.identifier.urihttps://doi.org/10.1504/IJCSE.2022.10051198
dc.identifier.urihttps://hdl.handle.net/20.500.14669/2910
dc.identifier.volume25
dc.identifier.wosWOS:000869821500006
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInderscience Enterprises Ltd
dc.relation.ispartofInternational Journal of Computational Science and Engineering
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241211
dc.subjectpreprocessing
dc.subjectdata mining
dc.subjectmultiple imputation
dc.subjectdecision tree classifier
dc.subjectmachine learning
dc.subjectbig data analytics
dc.titleOptimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems
dc.typeArticle

Dosyalar