Tracking Recurring Concepts from Evolving Data
Streams using Ensemble Method
Yange Sun1,2, Zhihai Wang2, Jidong
Yuan2, and Wei Zhang2
1School of Computer and Information Technology,
Xinyang Normal University, China
2School
of Computer and Information Technology, Beijing Jiaotong University, China
Abstract: Ensemble models
are the most widely used methods for classifying evolving data stream. However,
most of the existing data stream ensemble classification algorithms do not
consider the issue of recurring concepts, which
commonly exist in real-world applications. Motivated by this challenge, an
Ensemble with internal Change Detection (ECD) was proposed to enhance
performance by exploring the recurring concepts. It is done by maintaining a
pool of classifiers, which dynamically adds and removes classifiers in response
to the change detector. The algorithm adopts a two window change detection
model, which adopts the Jensen-Shannon divergence to measure the distance of
the distributions between old and recent data. When a change is detected, the
repository of stored historical concepts is checked for reuse. Experimental
results on both synthetic and real-world data streams demonstrate that
the proposed algorithm not only outperforms the state-of-art methods on standard
evaluation metrics, but also adapts well in different types of concept drift scenarios
especially when concept s reappear.
Keywords: Data
streams, ensemble classification, change detection, recurring concept,
Jensen-Shannon divergence.