A Novel Comparative Approach: Logistic Regression Enhanced by Bat Optimization Versus Logistic Regression Enhanced by Deep Belief Network for Remote Homologous Protein Detection

[ X ]

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE-Inst Electrical Electronics Engineers Inc

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Identifying remote homologous proteins is an important field in computational biology. An experimental study was conducted to find a solution to this using machine learning, and natural language processing algorithms. The SCOP 1.53 dataset, which has 54 families, was used. In this study, two different new designs were developed. As a preprocessing step, some numerical features were obtained from protein sequences using the TF-IDF vectorization method. Then, data augmentation was performed using the SMOTE-Tomek algorithm. The same preprocessing steps were used in the both methods. One of our new methods is a classification study using a two-stage Logistic Regression, and Deep Belief Network (LR-DBN), with an average accuracy of 77%, and with an F1 score of 75%. The other is also a classification study using a Logistic Regression method with Bat optimization (LR-B), with an average accuracy of 84%, and with an F1 score of 86%. LR-B with the SMOTE-Tomek method outperformed with an ROC-AUC score of 89%. Although LR-DBN with the SMOTE-Tomek method slightly performed poorly than LR-B with the SMOTE-Tomek method, it performed well in detecting remote homologous proteins.

Açıklama

Anahtar Kelimeler

Proteins, Logistic regression, Optimization, Amino acids, Terrain factors, Machine learning algorithms, Software, Feature extraction, Data models, Prediction algorithms, Bat algorithm, deep belief network, imbalanced data, logistic regression, protein remote homology, smote-tomek

Kaynak

IEEEAccess

WoS Q Değeri

Scopus Q Değeri

Cilt

13

Sayı

Künye