Gender Identification of the Speaker Using VQ Method

Vasif Nabiyev, Ergün Yücesoy

PDF

Published: Nov 6, 2013

Vasif Nabiyev, Ergün Yücesoy

Abstract

Speaking is the easiest and natural form of communication between people. Intensive studies are made in order to provide this communication via computers between people. The systems using voice biometric technology are attracting attention especially in the angle of cost and usage. When compared with the other biometic systems the application is much more practical. For example by using a microphone placed in the environment voice record can be obtained even without notifying the user and the system can be applied. Moreover the remote access facility is one of the other advantages of voice biometry. In this study, it is aimed to automatically determine the gender of the speaker through the speech waves which include personal information. If the speaker gender can be determined while composing models according to the gender information, the success of voice recognition systems can be increased in an important degree. Generally all the speaker recognition systems are composed of two parts which are feature extraction and matching. Feature extraction is the procedure in which the least information presenting the speech and the speaker is determined through voice signal. There are different features used in voice applications such as LPC, MFCC and PLP. In this study as a feature vector MFCC is used. Feature mathcing is the procedure in which the features derived from unknown speakers and known speaker group are compared. According to the text used in comparison the system is devided to two parts that are text dependent and text independent. While the same text is used in text dependent systems, different texts are used in indepentent text systems. Nowadays, DTW and HMM are text dependent, VQ and GMM are text indepentent matching methods. In this study due to the high success ratio and simple application features VQ approach is used.
In this study a system which determines the speaker gender automatically and text independent is proposed. The proposed system is composed of two levels that are training and testing. In the training level MFCC feature vector is calculated by speaker gender known voice records. MFCC feature vector models the frequency perception of human ear and is one of the most preferred methods. As in all the voice analysis methods, MFCC method is also applied through the short parts which are accepted as having stable voice proporties. These parts generally are chosen as 20-30 ms and while moving 10-15 ms shifting amounts they are applied to the whole signal. A window function is applied in order to decrease the discontinuty that are at the edges of derived analysis windows. In voice applications generally hamming window is preferred. Following the windowing procedure the signal is taken to the frequency space by FFT method. The derived FFT spectrum is converted to mel-spectrum by the scale which models human frequency perception and is called as mel-scala. Mel-scala has a lineer charactristics up to 1Khz and a logarithmic characteristics over 1 Khz. For converting procedure triangle filters are used of which the band width differs lineerly due to the mel-scala. Generally as the filter coefficient a value is chosen between 20 and 30. In the last stage, the logaritm of mel spectrum is taken and we back to time domain. The coefficients derived at the end are called MFCC. The MFCC features derived for each speaker are converted to a smaller vector space by using VQ method. VQ is the transformation to limited numbers of subspaces from a wide vector space. Each subspace is presented with a centre point which is named as code word. Code words constituates code book. One of the methods which is used to compress N number training vector group to M number (M N) code book vector is LBG algorithm.

The system is trained with 16 records in which 8 male and 8 female speaks the same sentence. In the testing level 10 different sentences which are spoken by 56 female and 112 male are used. In the total of 1680 test data only 34 incorrect decisions are made and 98% success is achieved.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

How to Cite

Vasif Nabiyev, Ergün Yücesoy. (2013). Gender Identification of the Speaker Using VQ Method. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 1(1), 35–47. Retrieved from https://turcomat.org/index.php/turkbilmat/article/view/4

Issue

Vol. 1 No. 1 (2009)

Section

Research Articles

You are free to:

Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices:

You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .

No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

Article Sidebar

Main Article Content