On the Propagation of Errors in the Size of Join Results
Query optimizers of current relational database systems use several statistics maintained by the system on the contents of the database to decide on the most efficient access plan for a given query. These statistics contain errors that transitively affect many estimates derived by the optimizer. We present a formal framework based on which the principles of this error propagation can be studied. Within this framework, we obtain several analytic results on how the error propagates in general, as well as in the extreme and average cases. We also provide results on guarantees that the database system can make based on the statistics that it maintains. Finally, we discuss some promising approaches to controlling the error propagation and derive several interesting properties of them.