Theses

Students! We are looking forward to work with you on the open topics listed below. You may also contact us if you think that you have identified an interesting challenge that may fit the research profile of our chair.

In either case, please send e-mail to db-theses@cs.uni-tuebingen.de (German is fine). Make sure to include the following information:

  • Which of our chair’s courses did you attend?
  • Beyond database systems, which courses/seminars have particularly raised your interest?
  • Are you a hands-on hacker type or do you prefer working with pencil and paper?
  • Which programming languages are part of your toolbox? Which of these do you master?
  • Did you work with functional programming languages (beyond the bits of Scheme taught in the introductory CS courses)?
  • Any projects you have worked on in the recent past?
  • What would be a suitable date to start the work on your thesis? What are your other commitments (e.g., courses, exams, day job) in the thesis time frame?
  • Please also enclose your transcript of records.

Thanks!

In Progress

Robert Stahl (B.Sc.)

Extend JSON Path Expressions in DuckDB

DuckDB currently supports a bunch of JSON path expressions. However, there are some missing features that are supported by other databases like PostgreSQL. The goal of this thesis is to extend the JSON path expressions in DuckDB to support more features. This includes, but is not limited to recursive path expressions.

Ann-Kathrin Claessens (M.Sc.)

From PostgreSQL to DuckDB: Unusual Queries, their Performance, and Readability

Each year, Quassnoi publishes a blog post (https://explainextended.com/) in which a problem is presented and solved using SQL. The problems are usually quite challenging and require a deep understanding of SQL and the underlying database system. Also, for these problems, SQL is typically not the first choice of language to solve them. This thesis aims to port some of these problem solutions to DuckDB, investigate their performance, and compare the readability of the SQL queries.

Fredo Hogen (B.Sc.)

How expressive is Flummi really? Can it run Advent of Code?


2024

Nico Faden (B.Sc.)

Bringing ORDINALITY to DuckDB

Ludwig Kolesch (B.Sc.)

Bringing Flummi to WITH MUTUALLY RECURSIVE on Materialize DB

Our Flummi compiler currently emits SQL code based on WITH RECURSIVE. Aim of this bachelor thesis is to bring this code to Materialize DB using its specialized variant of recursive CTEs: WITH MUTUALLY RECURSIVE.

Markus Holder (B.Sc.)

Bringing Flummi to umbra.trampoline

Our Flummi compiler currently emits SQL code based on WITH RECURSIVE. Aim of this bachelor thesis is to bring this code to Umbra using its specific tool for iteration: umbra.trampoline().

Felix Kofink (M.Sc.)

Database-Backed Computation Over Weather Data

Björn Bamberg (M.Sc.)

DuckDB: Improved Support for List Functions

DuckDB is an open source in-process SQL OLAP database management system and comes with support for nested values such as arrays. However, there are certain features that are not implemented at this time. This thesis aims to identify these missing features, implement and test them, and ideally have them merged. This project requires knowledge of C++ development as well as knowledge of database systems internals (i.e., as discussed in DB2).

Xenia Wetzel (B.Sc.)

Writing an Interpreter for a Database-Coupled Language


2023

Zora Pidde (B.Sc.)

Extending ByePy with geometric types and operators

In an effort to demonstrate the broader applicability of SQL as a compilation target our group, over the past year, has extended our original PL\SQL to SQL compiler with Byepy — a Python frontend. The currently support data types only include the very basics — integers, float, strings, booleans, composite types, and arrays. This selection already covers quite a few usecases, but many database system support fancier data types like dates, 2D shapes, JSON, etc.

Adrian Müller (M.Sc.)

Extending WITH RECURSIVE With Multiple Working Tables

The WITH RECURSIVE clause of SQL typically allows only a single working table. In this work, we will investigate what happens if we remove this restriction and allow any number of working tables to read from and write to. This project requires knowledge of C development as well as knowledge of database systems internals (i.e., as discussed in DB2).

Alexander Götz (B.Sc.)

From PL/pgSQL to C: What Are the Optimizations?

PL/pgSQL is an imperative language extension that is implemented as an interpreter layered on-top of PostgreSQL. This introduces friction during execution because for each embedded statement, an executor must be instantiated, run, and finally purged. In our research, we have developed methods to eliminate this problem by Compiling PL/SQL Away, which generates a single SQL query. In this thesis, we investigate what happens when we compile PL/pgSQL functions into equivalent C functions, thus eliminating the interpretation overhead of the PL/pgSQL interpreter.

Romain Carl (B.Sc.)

Implementing Abstract Machines in SQL

Abstract machines allow for detailed and precise analysis of how computer systems work. Abstract machines for functional programming languages include the SECD machine (1964) and the Krivine machine (1985). This thesis explores the implementation of these abstract machines in SQL using PostgreSQL and DuckDB.

Lukas Sailer (M.Sc.)

Mapping GraphQL Queries to SQL

GraphQL (or: GQL) is a query language for APIs, offering the ability to replace inefficient request-by-request API calls by bulk-oriented request processing. GraphQL understands links between API resources and can follow such references. When these resources are held in a relational database, can a given GQL query be systematically translated into a single SQL query that traverses relationships and collects the requested resources? Can we find a translation from GQL to SQL that avoids the infamous n+1 problem when multi-valued references are traversed?


2022

Tim Fischer (M.Sc.)

ByePy: Compilation of Python to SQL

Over the the last few years, our group has developed a compilation approach to compile PL/SQL UDFs to plain SQL queries. Post compilation, evaluation entirely happens on the SQL side of the fence. In particular, we use trampolined style to compile arbitrarily nested iterative control flow in PL/SQL into SQL’s recursive common table expressions. By applying the exact same compilation techniques to database-driven Python code, computations can be moved directly into the DBMS and thus close to the data they are working with.

Phil Elgert (M.Sc.)

Creating a Website with a Static Site Generator

Daniel Täsch (M.Sc.)

How to Optimize What Is Slow in Data Provenance and Why You Should Do It

In recent years, our group has developed a novel approach to provenance analysis for SQL. This approach is based on query rewriting. Given a query Q to be analyzed for its data provenance, the rewritten queries Q1 and Q2 are produced. Through evaluation of these two queries, data provenance is derived. The task of this Master thesis is to integrate one (or multiple) optimization steps in the query rewriting. Especially Q2 could benefit from static optimization of set expressions.

Marcus Huber (M.Sc.)

Optimization of PL/pgSQL Translations Using Batching

Our compilation approach compiles PL/SQL UDFs to plain SQL queries. We use trampolined style to compile arbitrarily nested iterative control flow in PL/SQL into SQL’s recursive common table expressions. Batched execution of function invocations as well as using multiple recursive anchors potentially improves performance of these translations. After manually compiling several examples, an experimental run-time evaluation should clarify in which cases batching improves performance.

Madeleine Mauz (M.Sc.)

Using Hashtables in Functional-Style UDF Compiled CTEs


2021

Felix Kofink (B.Sc.)

Constructing static websites based on AirTable databases

Thora Daneyko (B.Sc.)

Data Provenance for PL/pgSQL

Provenance in SQL describes the relationship between the output of a query and its input data (tables). In our recent research we employed an approach of language-level SQL rewriting to derive fine-grained where- and why- provenance for the output data of SQL SELECT-queries. [1] PL/pgSQL is a procedural language for PostgreSQL wich allows us to write functions using procedural control structures (LOOP, IF, EXCEPTION) on top of embedded SQL statements (SELECT-queries, but also INSERT, UPDATE and DELETE statements).

Jonatan Braun (M.Sc.)

Extend SQLite3 with Support for LATERAL Join

SQLite3 is the most widely used server-less Database Engine today. SQLite3 supports a wide range of SQL features, but it is lacking LATERAL-Joins. The LATERAL-keyword allows to reference columns provided by preceding FROM items. The goal of this thesis is to extend SQLite3 with support for LATERAL joins. This project requires skills in C development and knowledge about database internals (i.e., as discussed in Datenbanksysteme 2).

Tobias Burghardt (M.Sc.)

From Recursion To Iteration: Compiling SQL UDFs with Continuations

The goal of this thesis is to investigate and evaluate recursion removal techniques for (PL/pg)SQL UDFs. Continuation-Passing-Style, Defunctionalization, and Trampolining-Style can be used to transform recursive programs into equivalent tail-recursive programs, and finally into iterative programs. After manually compiling several examples, an experimental run-time evaluation should clarify whether this strategy is advantageous over naïve evaluation.


2020

Pascal Engel (B.Sc.)

How-Provenance Through Query Rewriting

In this BSc thesis, a SQL-to-SQL compiler is to be implemented. In recent research, we employed a two-step query rewrite strategy in order to compute Where- and Why-provenance for SQL. Using the approach from above, we want to implement a different provenance semantics, i.e. How-provenance. Related work by O’Grady et al. employed a query compilation approach to How-provenance. Originally, Green et al. introduced the notion of How-provenance in context of relational algebra and semirings.

Fabian Bauer (M.Sc.)

Projektorientierter Informatikunterricht zum Thema “Relationale Datenbanksysteme”

Tim Fischer (B.Sc.)

Utilizing parallelism in the two-phase approach of translated fsUDFs

We translate fsUDFs, UDF with recursive calls, into SQL. The way the UDF then evaluates does not rely on recursive calls anymore. Instead, we first determine the call graph of the fsUDF top-down and then evaluate it bottom-up with the help of (standard SQL) recursive queries. This two-phased approach hides potential for parallelism which we want to explore in this work.


2019

Martin Gabrich (M.Sc.)

Building a Turn Based Strategy Game with Database Backend

Jonas Wolff (B.Sc.)

Frontend for Creating Input for a Database-Supported Map Generator

To support our approach of building maps for video games completely within a database, the goal of this theses is a visual tool for creating building blocks for a map.

Peter Richter (M.Sc.)

How Does ToWithRecursive Hold Up?

Compare the ease of implementation, runtime and other features of ToWithRecursive in PostgreSQL with other analytic and framework environments.

Cedric Breuning (B.Sc.)

Implementierung von Program Slicing auf fsUDFs

Wir suchen nach einer geschickten Implementation von Program Slicing auf fsUDFs.

Stephan Biastoch (M.Sc.)

Intuitive Recursion Preprocessor for PostgreSQL

We aim to be able to write straight forward recursive user-defined functions functions (UDF) in PostgreSQL using LANGUAGE SQL. To achieve this, we need a preprocessor, translating a proper recursive UDF into something PostgreSQL can process utilising WITH RECURSIVE. This preprocessor and its rules have yet to be written and such encompasses the main work in this thesis.

Andreas Herzog (B.Sc.)

Normalization of SQL Queries

The analysis, transformation, and optimization of SQL queries can be streamlined and simplified when the incoming SQL queries show little (or: less) syntactic variety. This thesis will design a normal form for SQL queries that reduces the number of special cases, syntactic oddities, and countless abbreviations in SQL. The resulting SQL queries will be somewhat more verbose and explicit but exhibit syntactic regularity. Normalization is equivalence-preserving (but headache-reducing ;-). Based on our group’s Haskell-based parser for (Postgre)SQL queries, the student will implement the normalization transformation in Haskell.

Noah Doersing (M.Sc.)

Translating UnicodeMath into MathML

Allow the use of UnicodeMath — a Unicode-based intuitive rendering of complex math formulae — in Markdown documents. Translate UnicodeMath input into MathML for rendering (e.g., in Web browsers). Integrate this translator into Morgan McGuire’s Markdeep to make UnicodeMath immediately usable in day-to-day document authoring.


2018

Jonas König (B.Sc.)

Evaluating Game Engines To Infuse Them With DB Technology

While games are at the surface mostly a playful distraction for relaxed afternoons, they deal with considerable amounts of data under the hood. This thesis evaluates the aptness of several open game frameworks for their use with database technology.

Louisa Lambrecht (B.Sc.)

Functional Universe

Games are an opportunity to introduce students to programming gently. This thesis provides an abstraction for a game so that students can easily implement their own game clients.

Alexander Mühlbauer (B.Sc.)

Iteration to Tail Recursion in Python

Certain iterative python functions can be transformed into tail recursive functions using static program analysis. In this thesis, we strive to formulate and implement an automatic transformation method. Our goal: Iterative function goes in, tail recursive function comes out.

Gabriel Paradzik (B.Sc.)

Language-Level Provenance Analysis of SQL

We want you to embed a variant of Data Provenance Analysis directly into SQL queries. The two-step analysis approach to be used has been described in recent publications. In previous research, we first compiled SQL into imperative code and afterwards ran the provenance analysis. Novel to this topic is to omit the compilation step. This means that your implementation will transform an input SQL query into two new SQL queries. While the first query produces execution logs, the second derives the actual data provenance (see Figure 4 in the paper mentioned above).

Denis Hirn (M.Sc.)

PGcuckoo — Inject physical plans into PostgreSQL

Assemble physical execution plans for PostgreSQL and have them executed in the context of regular SQL queries. Creates the ability to design and compile such physical plans directly — no SQL or SQL compilation involved. Enables the design of PostgreSQL code generators that do not rely on SQL serialization and re-parsing. Building blocks: Haskell library and PostgreSQL extension.

Martin Fuß (B.Sc.)

Provenance of SQL Transactions

In our previous work we investigated and implemented Language-Level Provenance Analysis of SQL SELECT-queries. With this work we want to expand our two-phase approach to derive provenance of arbitrary SQL transactions (meaning sequences of SQL DML statements, including SELECT, INSERT, UPDATE and DELETE). The thesis is about the implementation of a SQL transformation tool. It returns two versions of a given SQL program: One equivalent, but with some additional logging. A second one, deriving the program’s provenance.

Zhili Zhang (M.Sc.)

Visualizing Query Plans

This thesis implements a visualiser for query plans in vanilla Javascript to be used in teaching.


2017

Denis Hirn (B.Sc.)

Compilation of SQL into KL

This thesis is about implementing a compiler in Haskell. The source dialect of SQL consists of reading queries only. The compiler has to cover aggregations, correlated subqueries and window functions. The target language will be KL (short for: Kernel Language), an imperative language designed for provenance analysis. All input queries have type annotations. These are to be translated as well.

Moritz Bruder (M.Sc.)

Database-Supported Haskell: Efficient Code Generation for MonetDB

MonetDB is a highly-efficient column-store database system that is a promising execution platform for Database-Supported Haskell (DSH). The goal of this student project is to improve an existing code generator to unlock the full potential of MonetDB. Students will Analyze and profile the code that is currently produced Implement improvements to the current code generator Work with the MonetDB optimizer framework Get to know the internals of an innovative database system This project requires solid skills in Haskell and knowledge about database system internals.

Lena Weinmann (B.Sc.)

Design and Implementation of a Browser-Based User Interface for Habitat

Habitat is an observational debugger for SQL developed in our research group. Users can highlight parts of a buggy SQL query in order to observe intermediate results of the evaluation of particular subexpressions and thus learn about the query’s actual runtime behaviour. Debugging with Habitat is an interactive process where the user places new markings based on the debugger output of previous markings, to narrow down the precise location of an error cause, step by step; playfully hunting the bug.

Noah Doersing (B.Sc.)

Kernel Language to LLVM Compiler

Compiler für die Minimalsprache “Kernel Language” in die hardwarenahe LLVM-Umgebung.

Simon Poschenrieder (M.Sc.)

Reduce query delay in IBM IDAA (Neteeza)

Optimize transaction placement in the IDAA transaction queue to minimize query delay due to update latencies between DB2 and IDAA.

Jonas Lingg (B.Sc.)

Simple Haskell Graphics Library (based on JavaScript Canvas)

Build a simple Haskell-based graphics library, designed to be used in the classroom (target courses: Functional Programming or a future iteration of Informatik 1). Create a shallow wrapper around the Haskell package blank-canvas based on the JavaScipt Canvas API. Hide gory package details and create additional graphics primitives: Add turtle graphics (definitely needed) Add interactivity, e.g. mouse/keyboard event handling? (nice to have)

Martin Lutz (B.Sc.)

Visualization of How-Provenance

This is a topic for a Bachelor or Master thesis in context of our Data Provenance research project. In short, the research field of Data Provenance is about the computation of the exact origin of certain data pieces. In our case, we aim to analyse arbitrary SQL queries and return the exact tuple|cell locations of everything that went into the computation of the querie’s result. Our research approach does not utilize relational algebra which makes it different from most approaches found in the scientific literature.


2016

Vanessa Rath (M.Sc.)

Analyse von Textkorpora im Bereich Recruiting

Marco Schneider (M.Sc.)

Clojure Library for Embedded Queries

Based on a Scheme prototype hacked by Mike Sperber. Maps PL objects to database tables in a flexible fashion. Maps query operations on the PL objects to database operations. Maps algebraic data types to relational tables. Attempts to target different DB backends (JSONiq?). Start work on actual MSc thesis in Mar 2016.

Jan Burchard (M.Sc.)

IBM Netezza Zone Maps and Geographic Data

Simulate and evaluate variants of IBM Netezza zone maps (index-like data structure that holds min/max values of designated columns) for multi-dimensional geographic data. In cooperation with IBM Research & Development, Germany (Böblingen)

Marco Häberle (B.Sc.)

Implementation of a Web-Based Frontend for JIT-Compiled PostgreSQL functions

Most of today’s database systems follow an interpreted query execution approach. Instead of generating a complete query tree, Just-in-time compilers empower us to compile parts of the tree to a intermediate language and finally to native machine code avoiding any interpretation overhead. These functions live in memory only making debugging as well as presenting the code difficult. Therefore a web-based frontend for the integrated display of query tree, LLVM intermediate language and x86 machine code has to be developed.

Daniel O'Grady (M.Sc.)

Optimized Supply Chain Management

Optimierung des Lagerbestandes auf der Basis von Bestell-/Verkaufsdaten der Vergangenheit. Kontext: SAP.

Hien Hung Nguyen (B.Sc.)

Peformance Optimizations for the PostgreSQL JIT Compiler

Jonas Weissensel (M.Sc.)

Ship route simulation based on location and weather data

Michael Zabka (M.Sc.)

Using Spark as a Backend for R

Explore the integration of database management systems and parallel data processing frameworks (here: Spark) into software environments for machine learning and statistical computing (here: R).


2015

Stefan Burnicki (M.Sc.)

Design of a Portable SQL Wire Protocol

Develop a proxy that serializes SQL query result data into a common wire format (JSON?) and converts between data type representations—a single client library suffices to communicate with multiple RDBMS back-ends (Transbase, PostgreSQL). In cooperation with Grau Data (Stuttgart) and Transaction Software GmbH (Munich).

Nadejda Ismailova (B.Sc.)

Development of a Data Provenance Analysis Tool for Python Bytecode

Data Provenance describes the origins of data. In this thesis we propose an approach of computing data provenance when data of interest is created by a Python program and present a tool implementing this approach. In imple- menting the tool, we adopt the following idea. In the first step, the program is instrumented to log data. In the second step, the code is analyzed with respect to the logged data and provenance information is computed.

Peter Richter (B.Sc.)

Extract and Sanitize PostgreSQL Query Parse Trees

The majority of the research projects in our group analyze, transform, and compile SQL queries. These projects vary widely but all of them rely on an internal representation of the incoming SQL query text. PostgreSQL comes with a sophisticated SQL parser — that also incorporates a large number of semantic checks, type inference, and query simplifications/normalizations — and the output of this parser would be the ideal input for the research projects just mentioned.


2014

Philipp Moers (B.Sc.)

Binoculars for Habitat: Implementation of a Web-Based Frontend for an Observational SQL-Debugger

Tobias Fabritz (B.Sc.)

Evaluation of decimal arithmetic for predicate evaluation in LLVM

Janek Bettinger (B.Sc.)

Implementation of a Browser-based Interface for Provenance Analysis

Data provenance is a research topic dealing with the origins of generated data. An already existing tool by the Database Research Group of the University of Tübingen computes provenance relations of Python programs and outputs its results in the relational data format. The primary purpose of this thesis is the development of an interactive browser-based interface that makes the analysis results human-accessible.

Steffen Brennscheidt (B.Sc.)

Implementation of a DSH code generator for MonetDB5 MAL

Alexander Schiller (B.Sc.)

Implementation of an XML to Relational Data Format Conversion Tool

Viele Nicht-Informatiker möchten mit Daten arbeiten, die in XML Dumps enthalten sind. Um diese Daten in eine Form zu bringen, in der diese einfacher zu verwerten sind, haben wir ein Konvertierungstool geschrieben, das XML Dokumente in CSV Dateien umwandelt. Dabei ist das Programm weitgehend unabhängig vom verfügbaren Arbeitsspeicher, was es möglich macht auch sehr große XML Dumps zu konvertieren, und von den Programmierkenntnisse des Anwenders.

Simon Kalt (B.Sc.)

Instrumentation of Python Bytecode and Symbolic Program Analysis

As part of a project to help non-programmers understand computer programs, the goal of this work was to generate information on the behaviour of Python programs. The implementation described in this work uses an approach based on instrumenting Python bytecode to trace back variable assignments in order to build a graph depicting how the output of the program depends on its intermediate and input values. The presented version of the implementation enables the user to analyze simple programs by visualizing the data flow occurring during to the computation of a program’s output.


2013

Development of a webservice frontend to support Performance-Simulation

Achim Kruse

GHC extensions for overloading list notation and rebinding of SQL-like monad comprehension notation

Tobias Müller

In Memory Implementation of Vector Primitives

DSH is a Haskell library for database supported program execution. It turns the database into a coprocessor for the Haskell runtime. Compilation of Haskell code for the database is a multi-step process. One of the intermediate languages is the vector algebra which is of interest to get a better understanding of the performance behaviour of DSH. We have implemented an in memory execution environment which executes vector algebra programs natively. The execution environment allows us to investigate the performance behaviour of each vector operation individually and without database context.

Alexander Zietlow

Performance analysis on database queries on IBM System Z, P & X

Worked on by Jessica Abele (BSc Bioinformatik) @ GE Healthcare, Munich

Representing Hierarchichal Structures in NoSQL Databases

Star Wars Database

Sebastian Brandt

Use variant of XPath Accelerator to index nested PDF data


2012

Implement Data Wrangler for DSH

Database Supported Haskell (DSH) ist eine spezialisierte Abfragesprache für verschachtelte und sortierte Datensammlungen. Diese Ausarbeitung handelt vom DSH-Wrangler, einem interaktiven Werkzeug mit grafischer Benutzeroberfläche (GUI) zur Aufbereitung von tabellarischen, möglicherweise verschachtelten, Daten. Ein langfristiges Ziel des DSH-Wranglers ist die Generierung von DSH-Abfragen entsprechend der Benutzeraktionen in der grafischen Benutzeroberfläche. Die Ausarbeitung beschreibt eine JSON Repräsentation für verschachtelte, tabellarische Daten und ein frühes Stadium der Entwicklung des DSH-Wranglers. Der DSH-Wrangler ist als Webapplikation in JavaScript implementiert und unterstützt die Funktionen und Datentypen von DSH.

Sebastian Engel

Trello Bridge

Use the Trello API to implement services that can be useful to the DB@UTÜ group.


2011

Alexander Ulrich

A Ferry-Based Query Backend for the Links Programming Language

This thesis describes the implementation of a Ferry-based query backend for the Links programming language. In Links, queries are seamlessly embedded into the language: Queries formulated in a subset of the language are translated into single SQL queries. Links uses static checks to ensure that a type-correct query expression can be translated into an equivalent SQL query and allows abstraction over parts of a query. The queryizable subset of Links is, however, severely limited in terms of supported functions and the data type (limited to bags of flat records) of queries.