Universality of Serial Histograms
Yannis Ioannidis
Date published: 
Published In: 
19th Int’l VLDB Conference, Dublin, Ireland, August 1993, pp. 256-267. (Received "10-Year Best Paper" Award in VLDB 2003.)
Conference Article

Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and demonstrate that they are optimal for reducing the query result size error for several classes of queries when the actual query result size (and hence the value of that error) reaches some extreme. Specifically, serial histograms are shown to be optimal for arbitrary tree equality-join queries when the query result size is maximized, whether or not the attribute independence assumption holds, and when the query result size is minimized and the attribute independence assumption holds. We also show that the expected error for any such query is always zero under all histograms, and thus argue that histograms should be chosen based on the reduction of the extreme-cases error, since reduction of the expected error is meaningless.

Related files: 

MaDgIK 2009-2018