An Optimized Model for Visual Speech Recognition Using
HMM
Sujatha Paramasivam1
and Radhakrishnan Murugesanadar2
1Department of
Computer Science and Engineering, Sudharsan Engineering College, India
2Department of
Civil Engineering, Sethu Institute of Technology, India
Abstract: Visual Speech Recognition (VSR) is to identify spoken
words from visual data only without the corresponding acoustic signals. It is
useful in situations in which conventional audio processing is ineffective like
very noisy environments or impossible like unavailability of audio signals. In
this paper, an optimized model for VSR is introduced which proposes simple
geometric projection method for mouth localization that reduces the computation
time.16-point distance method and chain code method are used to extract the
visual features and its recognition performance is compared using the
classifier Hidden Markov Model (HMM). To optimize the model, more prominent features
are selected from a large set of extracted visual attributes using Discrete
Cosine Transform (DCT). The experiments were conducted on an in-house database
of 10 digits [1 to 10] taken from 10 subjects and tested with 10-fold cross
validation technique. Also, the model is evaluated based on the metrics
specificity, sensitivity and accuracy. Unlike other models in the literature,
the proposed method is more robust to subject variations with high sensitivity
and specificity for the digits 1 to 10. The result shows that the combination
of 16-point distance method and DCT gives better results than only 16-point
distance method and chain code method.
Keywords: Visual speech recognition, feature extraction,
discrete cosine transform, chain code, hidden markov model.
Received March 20, 2015; accepted August 31, 2015
|