Enhanced Clustering-Based Topic Identification of
Transcribed Arabic Broadcast News
Ahmed Jafar1, Mohamed Fakhr1, and Mohamed Farouk2
1Department of Computer Science, Arab Academy for Science and Technology, Egypt
2Department of Engineering Math and Physics, Faculty of Engineering, Egypt
Abstract: This research presents an enhanced
topic identification of transcribed Arabic broadcast news using clustering
techniques. The enhancement includes applying new stemming technique
“rule-based light stemming” to balance the negative effects of the stemming
errors associated with light stemming and root-based stemming. New
possibilistic-based clustering technique is also applied to evaluate the degree
of membership that every transcribed document has in regard to every predefined
topic, hence detecting documents causing topic confusions that negatively
affect the accuracy of the topic-clustering process. The evaluation has showed
that using rule-based light stemming in combination of spectral clustering
technique achieved the highest accuracy, and this accuracy is further increased
after excluding confusing documents.
Keywords: Arabic speech transcription, topic
clustering.