DiVA: indexing high-dimensional data by Diving into Vector Approximations
Contemporary multimedia, scientiﬁc and medical applications use indexing structures to access their high-dimensional data. Yet, in sufﬁciently high-dimensional spaces, conventional tree-based access methods are eventually outperformed by simple serial scans. Vector quantization has been effectively used to index data that are mostly distributed uniformly. However, in real-world applications, clustered data and skewed query distributions are the norm. In this paper, we propose DiVA, an approach that selectively adapts the quantization step to accommodate varying indexing needs. This adaptation mechanism triggers the restructuring and possible expansion of DiVA so as to provide ﬁner indexing granularity and enhanced access performance in certain “hot” areas of the search space. User-supplied policies help both identify such “hot” areas and satisfy versatile application requirements. Experimentation with our detailed prototype shows that in a real-world data set, DiVA yields up-to 64% reduced I/O compared to competing methods such as the VA-ﬁle and the A-tree.