Last Updated: Oct 25, 2002

The COKO-KOLA Approach to Query Optimizer Development

Query optimizers generate plans to retrieve data requested by queries. Optimizers are hard to build because for any given query, there can be a prohibitively large number of plans to choose from. One technique for coping with this complexity is to divide optimization into two phases: a heuristic phase that narrows the space of plans to consider, and a cost-based phase that compares the relative merits of plans that lie in the narrowed space.

The heuristic phase of optimization uses query-to-query rewrites to transform queries into equivalent expressions that are more amenable to plan generation. This process has proven to be error-prone in the past, even when queries have been confined to the relatively simple Relational Data Model [Cod70]. Rewrites over nested queries and queries returning duplicates have been especially problematic, as evidenced by the well-known COUNT bug revealed of the query rewrites for nested queries presented in [Kim82]. The advent of object databases (i.e., object-oriented and object-relational databases) only exacerbates this issue by introducing more complex data and by implication, more complex queries and more complex query rewrites.

We introduce a novel approach to expressing query rewrites that permits formal reasoning about their correctness. Unlike existing approaches which express query rewrites fully or partially with programming language code, this approach enables verification of query rewrites with a theorem prover. At its foundation lies KOLA: our combinator-based (i.e., variable-free) query algebra that permits expression of simple query rewrites (rewrite rules) without code. Rewrite rules are formal, but lack the expressivity to capture many query rewrites that get used in practice. We address this issue in two ways:

This page describes our completed and on-going work in this area. The first section lists the papers we've written describing the development and features of COKO and KOLA. In the second section, we include source code for various implementations of our work. In the third section, we present a formal specification of KOLA expressed in Larch [GHG+93] as well as theorem prover scripts that verify well over 500 query rewrite rules expressed over KOLA queries. In the fourth section, we show the project members and in the fifth section we show related work in the field.

This material is based upon work supported by the National Science Foundation under Grant No. 9984960. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Jump to...