Two-Level Classification in Determining the Age
and Gender Group of a Speaker
Ergün Yücesoy
Vocational School of
Technical Sciences, Ordu University, Turkey
Abstract: In this study, the
classification of the speakers according to age and gender was discussed. Age
and gender classes were first examined separately, and then by combining these
classes a classification with a total of 7 classes was made. Speech signals
represented by Mel-Frequency Cepstral Coefficients (MFCC) and delta parameters
were converted into Gaussian Mixture Model (GMM) mean supervectors and
classified with a Support Vector Machine (SVM). While the GMM mean supervectors
were formed according to the Maximum-A-Posteriori (MAP) adaptive GMM-Universal
Background Model (UBM) configuration, the number of components was changed from
16 to 512, and the optimum number of components was decided. Gender
classification accuracy of the system developed using aGender dataset was
measured as 99.02% for two classes and 92.58% for three classes and age group
classification accuracy was measured as 67.03% for female and 63.79% for male. In the classification of age and gender classes
together in one step, an accuracy of 61.46% was obtained. In the study, a two-level approach was proposed for
classifying age and gender classes together. According to this approach, the
speakers were first divided into three classes as child, male and female, then
males and females were classified according to their age groups and thus a
7-class classification was realized. This two-level approach was increased the
accuracy of the classification in all other cases except when 32-component GMMs
were used. While the highest improvement of 2.45% was achieved with 64
component GMMs, an improvement of 0.79 was achieved with 256 component GMMs.
Keywords: GMM, mean supervector, speaker age and gender classification, SVM,
two level classification.