Data Provenance for PL/pgSQL
Provenance in SQL describes the relationship between the output of a query and its input data (tables). In our recent research we employed an approach of language-level SQL rewriting to derive fine-grained where- and why- provenance for the output data of SQL SELECT
-queries. [1]
PL/pgSQL is a procedural language for PostgreSQL wich allows us to write functions using procedural control structures (LOOP
, IF
, EXCEPTION
) on top of embedded SQL statements (SELECT
-queries, but also INSERT
, UPDATE
and DELETE
statements).
Data dependencies in PL/pgSQL programs with multiple SQL statements and intermediate modifications of the database state can become very complex. Deriving provenance for complete PL/pgSQL programs – not only individual queries – can be very useful in observing and debugging such program behaviour and data dependencies.
The task of this thesis is to
-
extend the query rewrite approach of [1] to derive data provenance for SQL DML statements and PL/pgSQL control structures and
-
implement the approach in Haskell, based on an existing PL/pgSQL parser and an existing implementation the query rewriter described in [1].