Research

Current Research Projects

Recursive Computation Over Relational Data

Move your computation close to the data! This age-old mantra of the database community asserts that we can expect a SQL query engine with immediate access to the data to perform significantly better than an external processor to which we have to ship the data first. The lore holds up if the computation is query-like and primarily involves filtering, data (re-)combination, grouping, or aggregation. It is much less clear how complex algorithms that rely on arbitrary iterative control flow or recursion can be efficiently evaluated inside the same SQL engines.

Rethinking Fixpoint Computation

SQL:1999 introduced WITH RECURSIVE—or recursive common table expressions (CTEs)—a true game changer which turned SQL into a Turing-complete programming language. WITH RECURSIVE is powerful and versatile, but proved to be notoriously hard to grasp and master. In this line of work, we derive new CTE variants from the simple loop-based operational semantics of SQL:1999’s WITH RECURSIVE. In the absence of fixpoint-based semantics and monotonicity restrictions, these CTE variants enable a SQL authoring style that mimics imperative algorithms.

UDF Compilation

“Move your computation close to the data” is an age-old mantra for developing efficient database-driven applications. However, following this advice requires a very intimate knowledge of the database query language at hand. In this branch of our research, we use techniques developed in other areas of computer science, such as programming languages, to enable you to write complex in-database computation without having to read the SQL standard from cover to cover.

Past Research Projects

Data Provenance for SQL

We explore new ways to derive the provenance (or lineage) of data items that flow through programs or queries. Once this provenance information has been derived, we know exactly which input items led the program (or query) to emit which output items (Why and Where Provenance), as well as which program parts were involved in the computation of each single item (How Provenance). Our exploration started with the analysis and instrumentation of Python programs used in Scientific Data Processing (in the context of the ScienceCampus Tübingen).

Database Supported Haskell

Database-Supported Haskell, DSH for short, is a Haskell library for database-supported program execution. Using the DSH library, a relational database management system (RDBMS) can be used as a coprocessor for the Haskell programming language, especially for those program fragments that carry out data-intensive and data-parallel computations. Rather than embedding a relational language into Haskell, DSH turns idiomatic Haskell programs into SQL queries. DSH in the Real World We have used DSH for large scale data analysis.

Ferry

Database-Supported Program Execution With project Ferry we try to establish a connection between two somewhat distant shores: programming languages and database technology. Ferry explores how far we can push the idea of relational database engines that directly and seamlessly participate in program evaluation to support the super-fast execution of data-intensive programs written in a variety of (general purpose) programming languages. Ferry builds on technology developed in the context of our project Pathfinder.

Habitat

True Language-Level SQL Debugging We present Habitat, a declarative observational debugger for SQL. Habitat facilitates true language-level (not: plan-level) debugging of, probably flawed, SQL queries that yield unexpected results. Users may mark arbitrary SQL subexpressions-ranging from literals, over fragments of predicates, to entire subquery blocks-to observe whether these evaluate as expected. The SQL query is transformed to additionally record the values of marked subexpressions during the process of evaluation. The notion of SQL subexpressions as functions of free variables allows to locate the observed values in the query evaluation process, in order to merge multiple observations into a single (nested) tabular display.

Pathfinder

A Purely Relational XQuery Processor We believe that relational databases are the most researched and best engineered query processing infrastructures available today. They are able to efficiently query tons of data. By using a relational database as runtime environment for an XQuery processor we can port 30+ years of research to the XQuery domain and build a processor that is able to scale well with increasing input sizes. Pathfinder is a re-targetable query compiler that turns XQuery expressions into table algebra queries.

Software/Hardware Codesign

These days software developers as well as hardware designers tend to live in their own worlds: Advancing abstraction let’s software developers forget about underlying hardware details. Hardware designers in turn get out of touch with recent software development approaches. As a consequence - even big companies having hardware as well as software development under one roof - are wasting valuable performance. At the interface between hardware and software development occur opportunities to gain significant advantages through development of methods, best practices and tools.