Enhanced Clustering-Based Topic Identification of Transcribed Arabic Broadcast News

Enhanced Clustering-Based Topic Identification of Transcribed Arabic Broadcast News

Ahmed Jafar1, Mohamed Fakhr1, and Mohamed Farouk2

1Department of Computer Science, Arab Academy for Science and Technology, Egypt

2Department of Engineering Math and Physics, Faculty of Engineering, Egypt

Abstract: This research presents an enhanced topic identification of transcribed Arabic broadcast news using clustering techniques. The enhancement includes applying new stemming technique “rule-based light stemming” to balance the negative effects of the stemming errors associated with light stemming and root-based stemming. New possibilistic-based clustering technique is also applied to evaluate the degree of membership that every transcribed document has in regard to every predefined topic, hence detecting documents causing topic confusions that negatively affect the accuracy of the topic-clustering process. The evaluation has showed that using rule-based light stemming in combination of spectral clustering technique achieved the highest accuracy, and this accuracy is further increased after excluding confusing documents. 

Keywords: Arabic speech transcription, topic clustering.

Received June 17, 2014; accepted January 27, 2015

 

Full text 

 

Read 2325 times Last modified on Tuesday, 11 September 2018 00:55
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…