Pruning Based Interestingness of Mined Classification Patterns
Ahmed Al-Hegami
Department of Computer Science, University of Sana'a, Yemen
Department of Computer Science, University of Sana'a, Yemen
Abstract: Classification is an important problem in data mining. Decision tree induction is one of the most common techniques that are applied to solve the classification problem. Many decision tree induction algorithms have been proposed based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to interpret and comprehend compare to the patterns induced by other classification algorithms, the constructed decision trees may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the patterns. For this reasons, the question of an appropriate constructing and providing a good pruning criteria have long been a topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensible tree. Even after pruning, the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining a simple, and interpretable, but only approximate decision tree much better than an accurate tree that involves a lot of details. In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The approach computes the subjective interestingness and uses it as a pruning criterion to prune away uninteresting patterns. The proposed framework helps in reducing the size of the induced model and maintaining the model. One of the features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The experimental results are quite promising.
Keywords: Knowledge discovery in databases, data mining, decision tree, domain knowledge, interestingness, novelty measure.
Received December 9, 2007; accepted March 30, 2008