Yazar "Dişken, Gökay" seçeneğine göre listele
Listeleniyor 1 - 6 / 6
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Fast Computation of Parameters of the Random Variable that is Logarithm of Sum of Two Independent Log-normally Distributed Random Variables(2022) Tüfekçi, Zekeriya; Dişken, GökayIn this paper, two fast methods are proposed for computation of mean and variance of a random variable which is logarithm of two log-normally distributed random variables. It is shown that mean and variance can be computed using only one dimensional numerical integration method. The speed of the proposed algorithms is compared with the baseline algorithm. Simulation results showed that the first proposed method decreases the execution time by an average of 43.98 %. Simulation results also showed that the second proposed method is faster than the first proposed method for the variances greater than 0.325.Öğe INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION(2024) Aydın, Barış; Dişken, GökayEnsuring security in speaker recognition systems is crucial. In the past years, it has been demonstrated that spoofing attacks can fool these systems. In order to deal with this issue, spoof speech detection systems have been developed. While these systems have served with a good performance, their effectiveness tends to degrade under noise. Traditional speech enhancement methods are not efficient for improving performance, they even make it worse. In this research paper, performance of the noise mask obtained via a convolutional neural network structure for reducing the noise effects was investigated. The mask is used to suppress noisy regions of spectrograms in order to extract robust i-vectors. The proposed system is tested on the ASVspoof 2015 database with three different noise types and accomplished superior performance compared to the traditional systems. However, there is a loss of performance in noise types that are not encountered during training phase.Öğe Increasing the Robustness of i-vectors with Model Compensated First Order Statistics(2023) Dişken, Gökay; Tüfekci, ZekeriyaSpeaker recognition systems achieved significant improvements over the last decade, especially due to the performance of the i-vectors. Despite the achievements, mismatch between training and test data affects the recognition performance considerably. In this paper, a solution is offered to increase robustness against additive noises by inserting model compensation techniques within the i-vector extraction scheme. For stationary noises, the model compensation techniques produce highly robust systems. Parallel Model Compensation and Vector Taylor Series are considered as state-of-the-art model compensation techniques. Applying these methods to the first order statistics, a noisy total variability space training is aimed, which will reduce the mismatch resulted by additive noises. All other parts of the conventional i-vector scheme remain unchanged, such as total variability matrix training, reducing the i-vector dimensionality, scoring the i-vectors. The proposed method was tested with four different noise types with several signal to noise ratios (SNR) from -6 dB to 18 dB with 6 dB steps. High reductions in equal error rates were achieved with both methods, even at the lowest SNR levels. On average, the proposed approach produced more than 50% relative reduction in equal error rate.Öğe Konuşmacı Tanıma Sistemlerinde Güvenliğin Ve Gürbüzlüğün Artırılmasına Yönelik Derin Öğrenme Tabanlı Yöntemlerin Geliştirilmesi(2023) Dişken, GökayBu proje, gelişen teknolojiler nedeniyle konuşmacı tanıma sistemleri için güvenlik açığı oluşturan sentezlenmiş konuşmaların ve tekrar-çal saldırılarının tespit edilmesine odaklanmıştır. ASVspoof organizasyonuna ait 2015 ? 2021 yılları arasında sunulan dört veri tabanı kullanılmıştır. Bu veri tabanları çeşitli sayıda konuşma sentezleme, konuşma çevirme, tekrar-çal (kaydedilen sesi oynatma) saldırıları içermektedir. Ayrıca literatüre uygun olarak NOISEX-92 ve QUT-NOISE veri setlerinden eklenebilir gürültü örnekleri kullanılmıştır. Böylece gürültü altında da sahte konuşma tespiti yapabilen gürbüz sistemlerin geliştirilmesi amaçlanmıştır. Kullanılan yöntemlerden biri, konuşmacı tanıma sistemlerinde de yüksek performans gösteren i-vector yöntemidir. Bu vektörler, farklı uzunluktaki konuşma verilerinin düşük boyutlu ve sabit uzunluklu temsilleridir. Gürbüz i-vectorler elde etmek amacıyla denoising autoencoder adı verilen derin öğrenme modeli kullanılarak gürültülü vektörlerin temiz vektörlere benzetimi sağlanmıştır. Farklı bir yöntem olarak, gürültü maskesi uygulanarak i-vectorlerin çıkarımı aşamasında gürbüzlük elde edilmiştir. Derin öğrenme modellerinin konuşmacı tanıma ve sahte konuşma tespiti çalışmalarındaki başarılı sonuçlar göz önüne alınarak evrişimsel (konvolüsyonel) sinir ağları (CNN) içerikli karmaşık mimariler de kullanılmıştır. Diferansiyel CNN kullanarak gürültü maskesi elde edilmiş ve bu alandaki en iyi çalışmalarla kıyaslanabilecek düzeyde başarı yakalanmıştır. Delta konvolüsyonu algoritması ve buna uygun filtreler geliştirilmiştir. Bu yeni yaklaşımın geleneksel kepstral özniteliklerle de ham ses verisi ile de çalışabildiği gösterilmiştir. Benzer modellerin öğrenebilen parametre sayısını büyük oranda azaltırken performansta kazanç sağlanabildiği gösterilmiştir. Çapraz veri testlerinde literatürdeki en iyi sonuçlardan birine ulaşılmıştır.Öğe Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder(2023) Dişken, Gökay; Tüfekçi, ZekeriyaAudio spoof detection gained attention of the researchers recently, as it is vital to detect spoofed speech for automatic speaker recognition systems. Publicly available datasets also accelerated the studies in this area. Many different features and classifiers have been proposed to overcome the spoofed speech detection problem, and some of them achieved considerably high performances. However, under additive noise, the spoof detection performance drops rapidly. On the other hand, number of studies about robust spoofed speech detection is very limited. The problem becomes more interesting as the conventional speech enhancement methods reportedly performed worse than no enhancement. In this work, i-vectors are used for spoof detection, and discriminative denoising autoencoder (DAE) network is used to obtain enhanced (clean) i-vectors from their noisy counterparts. Once the enhanced i-vectors are obtained, they can be treated as normal i-vectors and can be scored/classified without any modifications in the classifier part. Data from ASVspoof 2015 challenge is used with five different additive noise types, following a similar configuration of previous studies. The DAE is trained with a multicondition manner, using both clean and corrupted i-vectors. Three different noise types at various signal-to-noise ratios are used to create corrupted i-vectors, and two different noise types are used only in the test stage to simulate unknown noise conditions. Experimental results showed that the proposed DAE approach is more effective than the conventional speech enhancement methods.Öğe Recognition of non-speech sounds using Mel-frequency cepstrum coefficients and dynamic time warping method(Institute of Electrical and Electronics Engineers Inc., 2015) Dişken, Gökay; Ibrikçi, TurgayWith the developing technology, speech recognition systems are getting more space in our daily lives. Sounds in our environment are not only pure speech. Because of this, it is important for cochlear implants, unmanned vehicles and security systems to be able to recognize other sounds. In this work, Mel-frequency cepstrum coefficients, one of the most widely used methods for feature extraction in speech recognition, applied to various nature and animal sounds. Because each sound does not have the same duration, dynamic time warping, one of the methods used in speech recognition, is preferred to classify the feature vectors. The difference in durations of sounds affects the lengths of the feature vectors. With dynamic time warping method, one can overcome these differences. One reference record and 10 test records obtained from 10 different sound sources. True classification rate is found as 88%. © 2015 IEEE.