Selectivity Estimation of Range Queries in Data Streams using Micro-Clustering Print E-mail

Selectivity Estimation of Range Queries in Data Streams using Micro-Clustering

Sudhanshu Gupta and Deepak Garg

Computer Science and Engineering Department, Thapar University, India


Abstract: Selectivity estimation is an important task for query optimization. The common data mining techniques are not applicable on large, fast and continuous data streams as they require one pass processing of data. These requirements make range query estimation a challenging task. We propose a technique to perform range query estimation using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters. These micro-clusters also maintain data distribution information of the cluster values using cosine coefficients. These cosine coefficients are used for estimating range queries. The estimation can be done over a range of data values spread over a number of clusters. The technique has been compared with cosine series technique for selectivity estimation. Experiments have been conducted on both synthetic and real datasets of varying sizes and results confirm that our technique offers substantial improvements in accuracy over other methods.

Keywords: Selectivity estimation, range query, data streams, micro-clustering.

Received September 22, 2012; accepted December 24, 2013


< Prev   Next >
Copyright 2006-2009 Zarqa Private University. All rights reserved.
Print ISSN: 1683-3198.
Warning: fsockopen(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /hsphere/local/home/ccis2k/ on line 251 Warning: fsockopen(): unable to connect to (php_network_getaddresses: getaddrinfo failed: Name or service not known) in /hsphere/local/home/ccis2k/ on line 251 skterr