A Novel Approach to Maximize G-mean in
Nonstationary Data with Recurrent Imbalance Shifts
Radhika Kulkarni1,
S. Revathy1, and Suhas Patil2
1Department of Computer Science
Engineering, Sathyabama Institute of Science and Technology, India
2Department
of Computer Science Engineering, Bharati Vidyapeeth’s College of Engineering, India
Abstract: One of the noteworthy difficulties in
the classification of nonstationary data is handling data with class imbalance.
Imbalanced data possess the characteristics of having a lot of samples of one
class than the other. It, thusly, results in the biased accuracy of a
classifier in favour of a majority class. Streaming data may have inherent
imbalance resulting from the nature of dataspace or extrinsic imbalance due to
its nonstationary environment. In streaming data, timely varying class priors
may lead to a shift in imbalance ratio. The researchers have contemplated
ensemble learning, online learning, issue of class imbalance and cost-sensitive
algorithms autonomously. They have scarcely ever tended to every one of these
issues mutually to deal with imbalance shift in nonstationary data. This
correspondence shows a novel methodology joining these perspectives to augment
G-mean in no stationary data with Recurrent Imbalance Shifts (RIS). This
research modifies the state-of-the-art boosting algorithms,1) AdaC2 to get
G-mean based Online AdaC2 for Recurrent Imbalance Shifts (GOA-RIS) and AGOA-RIS
(Ageing and G-mean based Online AdaC2 for Recurrent Imbalance Shifts),
and 2) CSB2 to get G-mean based Online CSB2 for Recurrent Imbalance Shifts (GOC-RIS)
and Ageing and G-mean based Online CSB2 for Recurrent Imbalance Shifts (AGOC-RIS).
The study has empirically and statistically analysed the performances of the
proposed algorithms and Online AdaC2 (OA) and Online CSB2 (OC) algorithms using
benchmark datasets. The test outcomes demonstrate that the proposed algorithms
globally beat the performances of OA and OC.
Keywords: Cost-sensitive algorithms,
data stream classification, imbalanced data, online learning, population shift,
skewed data stream.
Received March 23, 2019; accepted April 13, 2020