Data Provenance for Recursive SQL Queries

Tobias MüllerTorsten GrustBenjamin Dietrich

Proceedings of 14th International Workshop on Theory and Practice of Provenance (TaPP 2022), collocated with SIGMOD 2022, Philadelphia, PA, USA, June 2022.

The adoption of recursion in SQL—framed either in terms of recursive common table expressions (CTEs) or recursive user-defined functions (UDFs)—marked a jump in the expressivity of the query language. The resulting queries can perform complex computation close to database-resident data but, at the same time, often prove challenging to understand and debug. We build on earlier work on the derivation of where- and why-provenance for complex (yet non-recursive) SQL queries to also embrace recursive SQL CTEs and UDFs. Fine-grained data provenance for recursive SQL is derived through language-level query rewriting and a two-phase evaluation strategy that does not invade the underlying RDBMS.