Self-Organizing Map vs Initial Centroid Selection Optimization to Enhance K-Means with Genetic Algorithm to Cluster Transcribed Broadcast News Documents
Ahmed
Maghawry1, Yasser Omar1, and Amr Badr2
1Department of Computer Science, Arab
Academy for Science and Technology, Egypt
2Department
of Computer Science, Cairo University, Egypt
Abstract: A compilation of artificial intelligence techniques
are employed in this research to enhance the process of clustering transcribed
text documents obtained from audio sources. Many clustering techniques suffer
from drawbacks that may cause the algorithm to tend to sub optimal solutions,
handling these drawbacks is essential to get better clustering results and
avoid sub optimal solutions. The main target of our research is to enhance
automatic topic clustering of transcribed speech documents, and examine the
difference between implementing the K-means algorithm using our Initial
Centroid Selection Optimization (ICSO) [16] with genetic algorithm optimization with Chi-square
similarity measure to cluster a data set then use a self-organizing map to
enhance the clustering process of the same data set, both techniques will be
compared in terms of accuracy. The evaluation showed that using K-means with
ICSO and genetic algorithm achieved the highest average accuracy.
Keywords: Clustering, k-means, self-organizing maps,
genetic algorithm, speech transcripts, centroid selection.