Data Provenance for PL/pgSQL

Provenance in SQL describes the relationship between the output of a query and its input data (tables). In our recent research we employed an approach of language-level SQL rewriting to derive fine-grained where- and why- provenance for the output data of SQL SELECT-queries. [1]

PL/pgSQL is a procedural language for PostgreSQL wich allows us to write functions using procedural control structures (LOOP, IF, EXCEPTION) on top of embedded SQL statements (SELECT-queries, but also INSERT, UPDATE and DELETE statements).

Data dependencies in PL/pgSQL programs with multiple SQL statements and intermediate modifications of the database state can become very complex. Deriving provenance for complete PL/pgSQL programs – not only individual queries – can be very useful in observing and debugging such program behaviour and data dependencies.

The task of this thesis is to

extend the query rewrite approach of [1] to derive data provenance for SQL DML statements and PL/pgSQL control structures and
implement the approach in Haskell, based on an existing PL/pgSQL parser and an existing implementation the query rewriter described in [1].

[1] Müller, Dietrich, Grust: You Say ‘What’, I Hear ‘Where’ and ‘Why’ — (Mis-)Interpreting SQL to Derive Fine-Grained Provenance

Contact

Benjamin Dietrich