Multilabel voice disorder classification using raw waveforms

dc.contributor.authorDisken, Gokay
dc.date.accessioned2025-01-06T17:37:13Z
dc.date.available2025-01-06T17:37:13Z
dc.date.issued2024
dc.description.abstractAutomated voice disorder systems that distinguish pathological voices from healthy ones have been developed with the aid of machine learning methods. Both clinicians and patients can benefit from these systems as they provide many advantages, compared to the invasive techniques. These systems can produce binary (healthy/pathological) or multiclass (healthy/selected pathologies) decisions. However, multiple disorders might exist in an individual's voice. Multilabel classification should be considered in such cases. By this time, only a single report is available on this topic, where hand-crafted features were used, and a data augmentation technique was utilized to overcome class imbalances. In this study, a similar experimental setup is followed to investigate the suitability of raw voice signals as inputs for multilabel classification. A deep learning model which consists of residual blocks and a novel gating mechanism is proposed. The gating mechanism weighs the channels of a residual block's output based on both its output and the previous layer's output. Using a SincNet filterbank that operates directly on the raw waveform as the initial layer, 0.99 accuracy and 0.98 F1 score were observed for natural /a/ vowels of Saarbruecken Voice Database with time domain augmentation to balance the class samples. On the other hand, reducing the number of augmented samples decreased the performance for both systems, indicating the need for a balanced dataset to avoid oversampling underrepresented classes. The proposed architecture performed consistently better than ResNet18 with deep connected attention, which verified the effectiveness of the proposed gating mechanism.
dc.identifier.doi10.55730/1300-0632.4089
dc.identifier.issn1300-0632
dc.identifier.issn1303-6203
dc.identifier.issue4
dc.identifier.scopus2-s2.0-85200239297
dc.identifier.scopusqualityQ2
dc.identifier.trdizinid1252375
dc.identifier.urihttps://doi.org/10.55730/1300-0632.4089
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1252375
dc.identifier.urihttps://hdl.handle.net/20.500.14669/2138
dc.identifier.volume32
dc.identifier.wosWOS:001280878700006
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.publisherTubitak Scientific & Technological Research Council Turkey
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241211
dc.subjectConvolutional neural network
dc.subjectdeep learning
dc.subjectmultilabel classification
dc.subjectvoice pathology
dc.titleMultilabel voice disorder classification using raw waveforms
dc.typeArticle

Dosyalar