A Novel Comparative Approach: Logistic Regression Enhanced by Bat Optimization Versus Logistic Regression Enhanced by Deep Belief Network for Remote Homologous Protein Detection

dc.authoridGEMC�, FAHR�YE/0000-0003-0961-5266
dc.authoridIBRIKCI, Turgay/0000-0003-1321-2523
dc.contributor.authorGemci, Fahriye
dc.contributor.authorIbrikci, Turgay
dc.contributor.authorCevik, Ulus
dc.date.accessioned2026-02-27T07:33:31Z
dc.date.available2026-02-27T07:33:31Z
dc.date.issued2025
dc.description.abstractIdentifying remote homologous proteins is an important field in computational biology. An experimental study was conducted to find a solution to this using machine learning, and natural language processing algorithms. The SCOP 1.53 dataset, which has 54 families, was used. In this study, two different new designs were developed. As a preprocessing step, some numerical features were obtained from protein sequences using the TF-IDF vectorization method. Then, data augmentation was performed using the SMOTE-Tomek algorithm. The same preprocessing steps were used in the both methods. One of our new methods is a classification study using a two-stage Logistic Regression, and Deep Belief Network (LR-DBN), with an average accuracy of 77%, and with an F1 score of 75%. The other is also a classification study using a Logistic Regression method with Bat optimization (LR-B), with an average accuracy of 84%, and with an F1 score of 86%. LR-B with the SMOTE-Tomek method outperformed with an ROC-AUC score of 89%. Although LR-DBN with the SMOTE-Tomek method slightly performed poorly than LR-B with the SMOTE-Tomek method, it performed well in detecting remote homologous proteins.
dc.identifier.doi10.1109/ACCESS.2025.3641298
dc.identifier.endpage209728
dc.identifier.issn2169-3536
dc.identifier.startpage209723
dc.identifier.urihttp://dx.doi.org/10.1109/ACCESS.2025.3641298
dc.identifier.urihttps://hdl.handle.net/20.500.14669/4621
dc.identifier.volume13
dc.identifier.wosWOS:001641531500044
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherIEEE-Inst Electrical Electronics Engineers Inc
dc.relation.ispartofIEEEAccess
dc.relation.publicationcategoryMakale - Uluslararas� Hakemli Dergi - Kurum ��retim Eleman�
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20260302
dc.subjectProteins
dc.subjectLogistic regression
dc.subjectOptimization
dc.subjectAmino acids
dc.subjectTerrain factors
dc.subjectMachine learning algorithms
dc.subjectSoftware
dc.subjectFeature extraction
dc.subjectData models
dc.subjectPrediction algorithms
dc.subjectBat algorithm
dc.subjectdeep belief network
dc.subjectimbalanced data
dc.subjectlogistic regression
dc.subjectprotein remote homology
dc.subjectsmote-tomek
dc.titleA Novel Comparative Approach: Logistic Regression Enhanced by Bat Optimization Versus Logistic Regression Enhanced by Deep Belief Network for Remote Homologous Protein Detection
dc.typeArticle

Dosyalar