How to Optimize What Is Slow in Data Provenance and Why You Should Do It
In recent years, our group has developed a novel approach to provenance analysis for SQL. This approach is based on query rewriting. Given a query Q to be analyzed for its data provenance, the rewritten queries Q1 and Q2 are produced. Through evaluation of these two queries, data provenance is derived.
The task of this Master thesis is to integrate one (or multiple) optimization steps in the query rewriting. Especially Q2 could benefit from static optimization of set expressions.
The query evaluation is carried out in PostgreSQL. The implementation language for the query rewriter is Haskell.