Scale-invariant MFCCs for speech/speaker recognition

dc.contributor.authorTufekci, Zekeriya
dc.contributor.authorDisken, Gokay
dc.date.accessioned2025-01-06T17:37:50Z
dc.date.available2025-01-06T17:37:50Z
dc.date.issued2019
dc.description.abstractThe feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.
dc.identifier.doi10.3906/elk-1901-231
dc.identifier.endpage3762
dc.identifier.issn1300-0632
dc.identifier.issn1303-6203
dc.identifier.issue5
dc.identifier.scopus2-s2.0-85072597726
dc.identifier.scopusqualityQ2
dc.identifier.startpage3758
dc.identifier.trdizinid337504
dc.identifier.urihttps://doi.org/10.3906/elk-1901-231
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/337504
dc.identifier.urihttps://hdl.handle.net/20.500.14669/2388
dc.identifier.volume27
dc.identifier.wosWOS:000486425400034
dc.identifier.wosqualityQ4
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.publisherTubitak Scientific & Technological Research Council Turkey
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241211
dc.subjectFeature extraction
dc.subjectspeaker recognition
dc.subjectspeech recognition
dc.titleScale-invariant MFCCs for speech/speaker recognition
dc.typeArticle

Dosyalar