Data Provenance for SQL
We explore new ways to derive the provenance (or lineage) of data items that flow through programs or queries. Once this provenance information has been derived, we know
- exactly which input items led the program (or query) to emit which output items (Why and Where Provenance), as well as
- which program parts were involved in the computation of each single item (How Provenance).
Our exploration started with the analysis and instrumentation of Python programs used in Scientific Data Processing (in the context of the ScienceCampus Tübingen). We now tweak and transfer the resulting techniques such that they apply to the derivation of data provenance for relational queries, SQL in particular. There is the potential to derive very fine-grained provenance information for substantially larger SQL dialects than were considered up to now.
Publications
Data Provenance for Recursive SQL Queries
Tobias Müller • Torsten Grust • Benjamin Dietrich
Proceedings of 14th International Workshop on Theory and Practice of Provenance (TaPP 2022), collocated with SIGMOD 2022, Philadelphia, PA, USA, June 2022.
How, Where, and Why Data Provenance Improves Query Debugging -- A Visual Demonstration of Fine-Grained Provenance Analysis for SQL
Tobias Müller • Pascal Engel
Proceedings of the 38th IEEE Int’l Conference on Data Engineering (ICDE 2022), Kuala Lumpur, Malaysia, May 2022.
You Say ‘What’, I Hear ‘Where’ and ‘Why’ — (Mis-)Interpreting SQL to Derive Fine-Grained Provenance
Tobias Müller • Benjamin Dietrich • Torsten Grust
Proceedings of the 44th Int’l Conference on Very Large Databases. PVLDB 11(11), pages 1536–1549. Rio de Janeiro, Brazil, August 2018.
How ‘How’ Explains What ‘What’ Computes — How-Provenance for SQL and Query Compilers
Daniel O'Grady • Tobias Müller • Torsten Grust
10th USENIX Workshop on Theory and Practise of Provenance (TaPP 2018), London, UK, July 2018.
Have Your Cake and Eat it, Too: Data Provenance for Turing-Complete SQL Queries
Proceedings of the VLDB 2016 PhD Workshop, New Delhi, India, September 2016.
The Best Bang for Your Bu(ck)g — When SQL Debugging and Data Provenance Go Hand in Hand
Benjamin Dietrich • Tobias Müller • Torsten Grust
Proceedings of the 19th Int’l Conference on Extending Database Technology (EDBT 2016), Bordeaux, France, March 2016.
Provenance for SQL Based on Abstract Interpretation: Value-less, but Worthwhile
Proceedings of the 41st Int’l Conference on Very Large Databases (VLDB 2015), Kohala Coast, Hawaii, USA, August 2015.
Where- und Why-Provenance für syntaktisch reiches SQL durch Kombination von Programmanalysetechniken
Proceedings of the 27th GI-Workshop Grundlagen von Datenbanken, Gommern, Germany, May 26-29, 2015.