A robust polynomial regression-based voice activity detector for speaker verification

dc.authoridTufekci, Zekeriya/0000-0001-7835-2741
dc.authoridCevik, Ulus/0000-0002-0956-9725
dc.authoridDisken, Gokay/0000-0002-8680-0636
dc.contributor.authorDisken, Gokay
dc.contributor.authorTufekci, Zekeriya
dc.contributor.authorCevik, Ulus
dc.date.accessioned2025-01-06T17:44:03Z
dc.date.available2025-01-06T17:44:03Z
dc.date.issued2017
dc.description.abstractRobustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-dominant regions by using a voice activity detector (VAD). In this paper, a second-order polynomial regression-based algorithm is proposed with a similar function as a VAD for text-independent speaker verification systems. The proposed method aims to separate steady noise/silence regions, steady speech regions, and speech onset/offset regions. The regression is applied independently to each filter band of a mel spectrum, which makes the algorithm fit seamlessly to the conventional extraction process of the mel-frequency cepstral coefficients (MFCCs). The kmeans algorithm is also applied to estimate average noise energy in each band for spectral subtraction. A pseudo SNR-dependent linear thresholding for the final VAD output decision is introduced based on the k-means energy centers. This thresholding considers the speech presence in each band. Conventional VADs usually neglect the deteriorative effects of the additive noise in the speech regions. Contrary to this, the proposed method decides not only for the speech presence, but also if the frame is dominated by the speech, or the noise. Performance of the proposed algorithm is compared with a continuous noise tracking method, and another VAD method in speaker verification experiments, where five different noise types at five different SNR levels were considered. The proposed algorithm showed superior verification performance both with the conventional GMM-UBM method, and the stateof- the-art i-vector method.
dc.identifier.doi10.1186/s13636-017-0120-6
dc.identifier.issn1687-4722
dc.identifier.scopus2-s2.0-85031497826
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.1186/s13636-017-0120-6
dc.identifier.urihttps://hdl.handle.net/20.500.14669/2899
dc.identifier.wosWOS:000413258200001
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer International Publishing Ag
dc.relation.ispartofEurasip Journal on Audio Speech and Music Processing
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241211
dc.subjectPolynomial regression
dc.subjectRobust speaker recognition
dc.subjectVoice activity detection
dc.titleA robust polynomial regression-based voice activity detector for speaker verification
dc.typeArticle

Dosyalar