What Makes the Duck Quack?
- Readers
- Torsten Grust • Denis Hirn • Tim Fischer • Louisa Lambrecht
⚠️ Signup
To apply for enrollment in this seminar, send an email to
db-lehre@cs.uni-tuebingen.de
by October 20th with the name of the seminar in the subject line and the following content:
- matriculation number
- course of study
- intended degree
- number of semesters
- Which courses in the (wider) field of database systems have you already attended? There is a limited number of places. Writing an email does not guarantee one. Acceptance/rejection emails will be sent in the first week of the semester.
Uncovering the Internals of the DuckDB Relational Database System
DuckDB is a lightweight relational database system whose internals are engineered to exploit the capabilities of modern hardware (multi-threaded CPUs and large RAM sizes, say). This seminar explores the algorithms, data structures, and grand ideas that let this young system outperform the established DBMS competitors. Students will each study one or two papers that explain a selected DuckDB component and also practically demonstrate how the published techniques “make the duck fly.”
Important Dates
Friday, December 8, 2023 | presentations |
Friday, December 15, 2023 | presentations |
Thursday, February 15, 2024 | handin paper |
Thursday, February 29, 2024 | handin reviews* |
Sunday, March 17, 2024 | final paper handin |
*Review cycle
After you first handin your papers, each student will receive two peer papers to read and review. You will handin the review report two weeks later and then find some more time to apply suggestions from your received reviews to your own paper.
Paper
We recommend reading the following papers up front to get to know DuckDB in general:
- M. Raasveldt, H. Mühleisen. Data Management for Data Science Towards Embedded Analytics
- M. Raasveldt, H. Mühleisen. DuckDB: an Embeddable Analytical Database
In the seminar, we want to discuss the topics listed below. The topics marked with a star are a mandatory part of the seminar (i.e., one participant will have to work on it). All materials (papers, blogposts) listed per topic form one topic. It is not sufficient to choose only one paper or blogpost.
1. Window functions
- R. Wesley, F. Xu. Incremental Computation of Common Windowed Holistic Aggregates
- V. Leis, A. Kemper, K. Kundhikanjana, T. Neumann. Efficient Processing of Window Functions in Analytical SQL Queries
- Blogpost: Fast Moving Holistic Aggregates
2. Aggregation / grouping
- V. Leis, P. Boncz, A. Kemper, T. Neumann. Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age
- Blogpost: Parallel Grouped Aggregation in DuckDB
3. Neumann-style query unnesting *
- T. Neumann, A. Kemper. Unnesting Arbitrary Queries
4. Lightweight compression *
- J. Stam. Low overhead self-optimizing storage for compression in DuckDB (Master thesis)
- Blogpost: Lightweight Compression in DuckDB
5. External sorting
- L. Kuiper, M. Raasveldt, H. Mühleisen. Efficient External Sorting in DuckDB
- L. Kuiper, H. Mühleisen. These Rows Are Made for Sorting and That’s Just What We’ll Do
6. SQL/PGQ extension
- D. ten Wolde, T. Singh, G. Szárnyas, P. Boncz. DuckPGQ: Efficient Property Graph Queries in an analytical RDBMS
- D. ten Wolde, P. Boncz, G. Szárnyas. DuckPGQ: Bringing SQL/PGQ to DuckDB
7. Vectorized expression evaluation *
- T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, P. Boncz. Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask
- Blogpost: Execution Format
Notes on the Presentations
Students should show the algorithm/technique described in the paper live during the presentation using the DuckDB prompt or DuckDB source code.
General notes for our seminar presentations can be found here.
Notes on the Papers
- General notes on our papers can be found here.