Adaptive Compression for Fast Scans on String Columns
State-of-the-art OLAP systems tend to use columnar data representations, as these are both suitable for analytics and amenable to compression. Local dictionary value encoding has been shown to achieve high compression rates for string columns while still allowing fast filtered scans. In this paper, we argue that the effectiveness and efficiency of local dictionary compression is limited by data repetition across file blocks and by dictionary look-ups inside each block during filtered scan execution. To address this problem, we introduce an adaptive compression technique that is based on differential dictionaries and targets both storage efficiency and query performance. The proposed scheme reduces dramatically the need to store repeated values across different file blocks and significantly accelerates read operations by reducing the time needed for dictionary look-ups. A preliminary set of experiments has given very promising results, showing that, in many cases, the proposed new dictionary compression scheme is much more efficient than existing techniques, occasionally up to an order of magnitude.