ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature

dc.authoridAtalay, Volkan/0000-0001-7850-0601
dc.authoridCetin-Atalay, Rengul/0000-0003-2408-6606
dc.authoridDogan, Tunca/0000-0002-1298-9763
dc.authoridDalkiran, Alperen/0000-0002-4243-7281
dc.authoridRifaioglu, Ahmet Sureyya/0000-0001-6717-4767
dc.authoridMartin, Maria-Jesus/0000-0001-5454-2815
dc.contributor.authorDalkiran, Alperen
dc.contributor.authorRifaioglu, Ahmet Sureyya
dc.contributor.authorMartin, Maria Jesus
dc.contributor.authorCetin-Atalay, Rengul
dc.contributor.authorAtalay, Volkan
dc.contributor.authorDogan, Tunca
dc.date.accessioned2025-01-06T17:44:06Z
dc.date.available2025-01-06T17:44:06Z
dc.date.issued2018
dc.description.abstractBackground: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. Results: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study. Conclusions: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred. ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html.
dc.description.sponsorshipYOK OYP scholarships
dc.description.sponsorshipAD and ASR were supported by YOK OYP scholarships. The funding body did not play any role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
dc.identifier.doi10.1186/s12859-018-2368-y
dc.identifier.issn1471-2105
dc.identifier.pmid30241466
dc.identifier.scopus2-s2.0-85053685317
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1186/s12859-018-2368-y
dc.identifier.urihttps://hdl.handle.net/20.500.14669/2926
dc.identifier.volume19
dc.identifier.wosWOS:000445215600004
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherBmc
dc.relation.ispartofBmc Bioinformatics
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241211
dc.subjectProtein sequence
dc.subjectEC numbers
dc.subjectFunction prediction
dc.subjectMachine learning
dc.subjectBenchmark datasets
dc.titleECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
dc.typeArticle

Dosyalar