VParC: A
Compression Scheme for Numeric Data in Column-oriented Databases
Ke Yan1, Hong Zhu1
and Kevin Lü2
1School of Computer Science and Technology, Huazhong University of Science and Technology, China
2Brunel University, UK
Abstract: Compression is one of
the most important techniques in data management, which is usually used to
improve the query efficiency in database. However, there are some
restrictions on existing compression algorithms that have been applied to
numeric data in column-oriented databases. First, a compression algorithm is
suitable only for columns with certain data distributions not for all kinds of data
columns; second, a data column with irregular distribution is hard to be
compressed; third, the data column compressed by using heavyweight methods
cannot be operated before decompression which leads to inefficient query. Based
on the fact that it is more possible for a column to have sub-regularity than
have global-regularity, we developed a compression scheme called Vertically
Partitioning Compression (VParC). This method is suitable for columns with
different data distributions, even for irregular columns in some cases. The
more important thing is that data compressed by VParC can be operated directly without
decompression in advance. Details of the compression and query evaluation approaches
are presented in this paper and the results of our experiments demonstrate the promising features of VParC.
Keywords: Column-stores, data management, compression, query processing,
analytical workload.
Received August 28, 2013; accepted 21
April, 2014