Data Provenance for SQL

We explore new ways to derive the provenance (or lineage) of data items that flow through programs or queries. Once this provenance information has been derived, we know

  1. exactly which input items led the program (or query) to emit which output items (Why and Where Provenance), as well as
  2. which program parts were involved in the computation of each single item (How Provenance).

Our exploration started with the analysis and instrumentation of Python programs used in Scientific Data Processing (in the context of the ScienceCampus Tübingen). We now tweak and transfer the resulting techniques such that they apply to the derivation of data provenance for relational queries, SQL in particular. There is the potential to derive very fine-grained provenance information for substantially larger SQL dialects than were considered up to now.

Publications

Data Provenance for Recursive SQL Queries

Tobias MüllerTorsten GrustBenjamin Dietrich

Proceedings of 14th International Workshop on Theory and Practice of Provenance (TaPP 2022), collocated with SIGMOD 2022, Philadelphia, PA, USA, June 2022.

How, Where, and Why Data Provenance Improves Query Debugging -- A Visual Demonstration of Fine-Grained Provenance Analysis for SQL

Tobias Müller • Pascal Engel

Proceedings of the 38th IEEE Int’l Conference on Data Engineering (ICDE 2022), Kuala Lumpur, Malaysia, May 2022.

Detached Provenance Analysis

Tobias Müller

PhD Thesis, Universität Tübingen, 2020.

You Say ‘What’, I Hear ‘Where’ and ‘Why’ — (Mis-)Interpreting SQL to Derive Fine-Grained Provenance

Tobias MüllerBenjamin DietrichTorsten Grust

Proceedings of the 44th Int’l Conference on Very Large Databases. PVLDB 11(11), pages 1536–1549. Rio de Janeiro, Brazil, August 2018.

How ‘How’ Explains What ‘What’ Computes — How-Provenance for SQL and Query Compilers

Daniel O'GradyTobias MüllerTorsten Grust

10th USENIX Workshop on Theory and Practise of Provenance (TaPP 2018), London, UK, July 2018.

Have Your Cake and Eat it, Too: Data Provenance for Turing-Complete SQL Queries

Tobias Müller

Proceedings of the VLDB 2016 PhD Workshop, New Delhi, India, September 2016.

The Best Bang for Your Bu(ck)g — When SQL Debugging and Data Provenance Go Hand in Hand

Benjamin DietrichTobias MüllerTorsten Grust

Proceedings of the 19th Int’l Conference on Extending Database Technology (EDBT 2016), Bordeaux, France, March 2016.

Provenance for SQL Based on Abstract Interpretation: Value-less, but Worthwhile

Tobias MüllerTorsten Grust

Proceedings of the 41st Int’l Conference on Very Large Databases (VLDB 2015), Kohala Coast, Hawaii, USA, August 2015.

Where- und Why-Provenance für syntaktisch reiches SQL durch Kombination von Programmanalysetechniken

Tobias Müller

Proceedings of the 27th GI-Workshop Grundlagen von Datenbanken, Gommern, Germany, May 26-29, 2015.