Machine Learning-Based Classification of Albanian Wines by Grape Variety, Using Phenolic Compound Dataset

[ X ]

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

MDPI

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Wine phenolics serve as robust chemical signatures correlated to grape variety, processing, and regional identity. This study explores the potential of machine learning algorithms, combined with the phenolic profiles of Albanian wines, to classify them according to grape variety. Geographic origin analysis was conducted as a preliminary exploration. The dataset of phenolic compounds included white and red wines, spanning the 2017 to 2021 vintages. Using five supervised algorithms-Support Vector Machine (SVM), Random Forest, XGBoost, Logistic Regression, and K-Nearest Neighbors-a high classification accuracy was achieved, with SVM reaching 100% under Leave-One-Out Cross-Validation (LOOCV). To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) and stratified cross-validation were applied. Random Forest feature importance consistently highlighted trans-Fertaric acid and Procyanidin B3 as dominant discriminants. Parallel coordinates plots demonstrated clear varietal patterns driven by phenolic differences, while PCA and hierarchical clustering confirmed unsupervised grouping consistent with wine type and maceration level. Permutation testing (1000 iterations) confirmed the non-randomness of model performance. These findings show that a small set of phenolic markers can offer high classification accuracy, supporting chemically based wine authentication. Although the dataset is relatively small, thorough cross-validation, non-redundant modeling, and chemical interpretability provide a solid foundation for scalable methods. Future work will expand the dataset and explore sensor-based phenolic measurement to enable rapid authentication in wine.

Açıklama

Anahtar Kelimeler

wine phenolics, machine learning, wine authenticity, Albanian grape varieties, LOOCV, PCA, random forest, SVM classification

Kaynak

Analytica

WoS Q Değeri

Scopus Q Değeri

Cilt

6

Sayı

4

Künye