Machine Learning-Based Classification of Albanian Wines by Grape Variety, Using Phenolic Compound Dataset

dc.authoridKelebek, Hasim/0000-0002-8419-3019
dc.authoridTopi, itan/0000-0001-7852-5374
dc.authoridG��L�, GAMZE/0000-0001-7317-6101
dc.authoridselli, serkan/0000-0003-0450-2668
dc.authoridTopi, Ardiana/0000-0002-3655-6516
dc.contributor.authorTopi, Ardiana
dc.contributor.authorKasaj, Agim
dc.contributor.authorHudhra, Daniel
dc.contributor.authorKelebek, Hasim
dc.contributor.authorGuclu, Gamze
dc.contributor.authorSelli, Serkan
dc.contributor.authorTopi, Dritan
dc.date.accessioned2026-02-27T07:33:33Z
dc.date.available2026-02-27T07:33:33Z
dc.date.issued2025
dc.description.abstractWine phenolics serve as robust chemical signatures correlated to grape variety, processing, and regional identity. This study explores the potential of machine learning algorithms, combined with the phenolic profiles of Albanian wines, to classify them according to grape variety. Geographic origin analysis was conducted as a preliminary exploration. The dataset of phenolic compounds included white and red wines, spanning the 2017 to 2021 vintages. Using five supervised algorithms-Support Vector Machine (SVM), Random Forest, XGBoost, Logistic Regression, and K-Nearest Neighbors-a high classification accuracy was achieved, with SVM reaching 100% under Leave-One-Out Cross-Validation (LOOCV). To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) and stratified cross-validation were applied. Random Forest feature importance consistently highlighted trans-Fertaric acid and Procyanidin B3 as dominant discriminants. Parallel coordinates plots demonstrated clear varietal patterns driven by phenolic differences, while PCA and hierarchical clustering confirmed unsupervised grouping consistent with wine type and maceration level. Permutation testing (1000 iterations) confirmed the non-randomness of model performance. These findings show that a small set of phenolic markers can offer high classification accuracy, supporting chemically based wine authentication. Although the dataset is relatively small, thorough cross-validation, non-redundant modeling, and chemical interpretability provide a solid foundation for scalable methods. Future work will expand the dataset and explore sensor-based phenolic measurement to enable rapid authentication in wine.
dc.identifier.doi10.3390/analytica6040043
dc.identifier.issn2673-4532
dc.identifier.issue4
dc.identifier.urihttp://dx.doi.org/10.3390/analytica6040043
dc.identifier.urihttps://hdl.handle.net/20.500.14669/4633
dc.identifier.volume6
dc.identifier.wosWOS:001645953700001
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherMDPI
dc.relation.ispartofAnalytica
dc.relation.publicationcategoryMakale - Uluslararas� Hakemli Dergi - Kurum ��retim Eleman�
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20260302
dc.subjectwine phenolics
dc.subjectmachine learning
dc.subjectwine authenticity
dc.subjectAlbanian grape varieties
dc.subjectLOOCV
dc.subjectPCA
dc.subjectrandom forest
dc.subjectSVM classification
dc.titleMachine Learning-Based Classification of Albanian Wines by Grape Variety, Using Phenolic Compound Dataset
dc.typeArticle

Dosyalar