Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction
Bancha Luaphol Department of Digital Technology, Kalasin University, Thailand This email address is being protected from spambots. You need JavaScript enabled to view it. |
Jantima Polpinij Department of Computer Science, Mahasarakham University, Thailand This email address is being protected from spambots. You need JavaScript enabled to view it. |
Manasawee Kaenampornpan Department of Computer Science, Mahasarakham University, Thailand This email address is being protected from spambots. You need JavaScript enabled to view it. |
Abstract: In general, most existing bug report studies focus only on solving a single specific issue. Considering of multiple issues at one is required for a more complete and comprehensive process of bug fixing. We took up this challenge and proposed a method to analyze two issues of bug reports based on text mining techniques. Firstly, dependent bug reports are assembled into an individual cluster and then the bug reports in each cluster are analyzed for their severity. The method of dependent bug report assembly is experimented with threshold-based similarity analysis. Cosine similarity and BM25 are compared with term frequency (tf) weighting to obtain the most appropriate method. Meanwhile, four classification algorithms namely Random Forest (RF), Support Vector Machines (SVM) with the RBF kernel function, Multinomial Naïve Bayes (MNB), and k-Nearest Neighbor (k-NN) are utilized to model the bug severity predictor with four term weighting schemes, i.e., tf, term frequency-inverse document frequency (tf-idf), term frequency-inverse class frequency (tf-icf), and term frequency-inverse gravity moment (tf-igm). After the experimentation process, BM25 was found to be the most appropriate for dependent bug report assemblage, while for severity prediction using tf-icf weighting on the RF method yielded the best performance value.
Keywords: Bug report, dependent bug report assembly, bug severity prediction, threshold-based similarity analysis, cosine similarity, BM25, term weighting, classification algorithm.
Received April 28, 2020; accepted February 13, 2022