Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction

  • Ghadeer Written by
  • Update: 03/11/2022

Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction

Bancha Luaphol

Department of Digital Technology, Kalasin University, Thailand

This email address is being protected from spambots. You need JavaScript enabled to view it.         

Jantima Polpinij

Department of Computer Science, Mahasarakham University, Thailand

This email address is being protected from spambots. You need JavaScript enabled to view it.

Manasawee Kaenampornpan

 Department of Computer Science, Mahasarakham University, Thailand

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract: In general, most existing bug report studies focus only on solving a single specific issue. Considering of multiple issues at one is required for a more complete and comprehensive process of bug fixing. We took up this challenge and proposed a method to analyze two issues of bug reports based on text mining techniques. Firstly, dependent bug reports are assembled into an individual cluster and then the bug reports in each cluster are analyzed for their severity. The method of dependent bug report assembly is experimented with threshold-based similarity analysis. Cosine similarity and BM25 are compared with term frequency (tf) weighting to obtain the most appropriate method. Meanwhile, four classification algorithms namely Random Forest (RF), Support Vector Machines (SVM) with the RBF kernel function, Multinomial Naïve Bayes (MNB), and k-Nearest Neighbor (k-NN) are utilized to model the bug severity predictor with four term weighting schemes, i.e., tf, term frequency-inverse document frequency (tf-idf), term frequency-inverse class frequency (tf-icf), and term frequency-inverse gravity moment (tf-igm). After the experimentation process, BM25 was found to be the most appropriate for dependent bug report assemblage, while for severity prediction using tf-icf weighting on the RF method yielded the best performance value.

Keywords: Bug report, dependent bug report assembly, bug severity prediction, threshold-based similarity analysis, cosine similarity, BM25, term weighting, classification algorithm.

Received April 28, 2020; accepted February 13, 2022

https://doi.org/10.34028/iajit/19/6/9

Full text

Read 511 times Last modified on Thursday, 03 November 2022 10:22
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…