DiVA: indexing high-dimensional data by Diving into Vector Approximations

Jan 2011
Conference

Contemporary multimedia, scientiﬁc and medical applications use indexing structures to access their high-dimensional data. Yet, in sufﬁciently high-dimensional spaces, conventional tree-based access methods are eventually outperformed by simple serial scans. Vector quantization has been effectively used to index data that are mostly distributed uniformly. However, in real-world applications, clustered data and skewed query distributions are the norm. In this paper, we propose DiVA, an approach that selectively adapts the quantization step to accommodate varying indexing needs. This adaptation mechanism triggers the restructuring and possible expansion of DiVA so as to provide ﬁner indexing granularity and enhanced access performance in certain “hot” areas of the search space. User-supplied policies help both identify such “hot” areas and satisfy versatile application requirements. Experimentation with our detailed prototype shows that in a real-world data set, DiVA yields up-to 64% reduced I/O compared to competing methods such as the VA-ﬁle and the A-tree.

Citation

Konstantinos Tsakalozos, Spiros Evangelatos, Alex Delis, "DiVA: indexing high-dimensional data by Diving into Vector Approximations ", Proc. of the 2011 IEEE Int. Conf. on Multimedia and Expo (ICME 2011), 2011

File

ted-icme11.pdf