Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and

Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus

Mohammad Abushariah1,2, Raja Ainon1, Roziati Zainuddin1, Moustafa Elshafei3, and Othman Khalifa4
1Faculty of Computer Science and Information of Technology, University of Malaya, Malaysia
2King Abdullah II School for Information Technology, University of Jordan, Jordan
3Department of Systems Engineering, King Fahd University of Petroleum and Minerals, Saudi Arabia
4Faculty of Engineering, International Islamic University Malaysia, Malaysia
 
Abstract: This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is based on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains both bi-grams and tri-grams. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a WER of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a WER of 15.59% and 14.44% with and without diacritical marks respectively.


Keywords: Arabic automatic speech recognition, arabic speech corpus, phonetically rich and balanced, acoustic model, and statistical language model.

Received December 22, 2009; accepted May 20, 2010

Read 3222 times Last modified on Thursday, 27 October 2011 05:31
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…