A Novel Approach to Maximize G-mean in Nonstationary Data with Recurrent Imbalance Shifts

A Novel Approach to Maximize G-mean in Nonstationary Data with Recurrent Imbalance Shifts

Radhika Kulkarni1, S. Revathy1, and Suhas Patil2

1Department of Computer Science Engineering, Sathyabama Institute of Science and Technology, India

2Department of Computer Science Engineering, Bharati Vidyapeeth’s College of Engineering, India

Abstract: One of the noteworthy difficulties in the classification of nonstationary data is handling data with class imbalance. Imbalanced data possess the characteristics of having a lot of samples of one class than the other. It, thusly, results in the biased accuracy of a classifier in favour of a majority class. Streaming data may have inherent imbalance resulting from the nature of dataspace or extrinsic imbalance due to its nonstationary environment. In streaming data, timely varying class priors may lead to a shift in imbalance ratio. The researchers have contemplated ensemble learning, online learning, issue of class imbalance and cost-sensitive algorithms autonomously. They have scarcely ever tended to every one of these issues mutually to deal with imbalance shift in nonstationary data. This correspondence shows a novel methodology joining these perspectives to augment G-mean in no stationary data with Recurrent Imbalance Shifts (RIS). This research modifies the state-of-the-art boosting algorithms,1) AdaC2 to get G-mean based Online AdaC2 for Recurrent Imbalance Shifts (GOA-RIS) and AGOA-RIS (Ageing and G-mean based Online AdaC2 for Recurrent Imbalance Shifts), and 2) CSB2 to get G-mean based Online CSB2 for Recurrent Imbalance Shifts (GOC-RIS) and Ageing and G-mean based Online CSB2 for Recurrent Imbalance Shifts (AGOC-RIS). The study has empirically and statistically analysed the performances of the proposed algorithms and Online AdaC2 (OA) and Online CSB2 (OC) algorithms using benchmark datasets. The test outcomes demonstrate that the proposed algorithms globally beat the performances of OA and OC.

Keywords: Cost-sensitive algorithms, data stream classification, imbalanced data, online learning, population shift, skewed data stream.

Received March 23, 2019; accepted April 13, 2020

https://doi.org/10.34028/iajit/18/1/12
Last modified on Thursday, 24 December 2020 05:40
Share:
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…