Skip to main content

Universality of Serial Histograms

Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and demonstrate that they are optimal for reducing the query result size error for several classes of queries when the actual query result size (and hence the value of that error) reaches some extreme. Specifically, serial histograms are shown to be optimal for arbitrary tree equality-join queries when the query result size is maximized, whether or not the attribute independence assumption holds, and when the query result size is minimized and the attribute independence assumption holds. We also show that the expected error for any such query is always zero under all histograms, and thus argue that histograms should be chosen based on the reduction of the extreme-cases error, since reduction of the expected error is meaningless.

Yannis Ioannidis, "Universality of Serial Histograms ", 19th Int’l VLDB Conference, Dublin, Ireland, August 1993, pp. 256-267. (Received "10-Year Best Paper" Award in VLDB 2003.), 1993
Published at
19th Int’l VLDB Conference, Dublin, Ireland, August 1993, pp. 256-267. Received
Related research area
No related research area
Related Organizations
No related organizations