Mining Closed and Multi-Supports-Based Sequential Pattern in High-Dimensional Dataset
Meng Han1,2, Zhihai Wang1,and Jidong Yuan1
1School of Computer and Information Technology, Beijing Jiaotong University, China[1]
2School of Computer Science and Engineering, Beifang University of Nationalities, China
Abstract: Previous mining algorithms on high dimensional datasets, such as biological dataset, create very large patterns sets as a result which includes small and discontinuous sequential patterns. These patterns do not bear any useful information for usage. Mining sequential patterns in such sequences need to consider different forms of patterns, such as contiguous patterns, local patterns which appear more than one time in a special sequence and so on. Mining closed pattern leads to a more compact result set but also a better efficiency. In this paper, a novel algorithm based on BI - directional extension and multi-supports is provided specifically for mining contiguous closed patterns in high dimensional dataset. Three kinds of contiguous closed sequential patterns are mined which are sequential patterns, local sequential patterns and total sequential patterns. Thorough performances on biological sequences have demonstrated that the proposed algorithm reduces memory consumption and generates compact patterns.
A detailed analysis of the multi-supports-based results is provided in this paper.
Keywords: High - dimensional dataset, closed pattern, contiguous pattern, multi - supports, biological sequences.
Receivsed Junuary 11, 2012; accepted April 29, 2013