New Compilation Methods for Complex User-Defined Functions
User-defined functions (UDFs) can be used to extend database systems with custom functionality. However, the performance of UDFs is often disappointing, which renders them useless for many applications. It has become common developer wisdom to avoid UDFs in performance-critical applications if possible. Previous work on UDF performance improvements has left many insights by the programming languages community unexplored. This thesis aims to fill this gap by leveraging programming language techniques. We show that these techniques can be adapted such that SQL can be used to express both iterative and recursive UDFs in a way that improves performance over the initial UDF formulation.
In the first part of this thesis, we will focus on iterative UDFs written in variants of PL/SQL. The imperative statement-by-statement style of programming clashes with the plan-based evaluation of SQL queries resulting in an impedance mismatch and therefore friction at runtime, slowing PL/SQL execution down significantly. State of the art approaches can only compile UDFs with simple, linear control flow and cannot handle looping control flow at all, or they require database extensions, which are not portable and not available on all systems. We present a novel approach to UDF compilation that uses trampolined style to compile UDFs with arbitrarily complex looping control flow to pure SQL queries. After compilation, the PL/SQL interpreter is no longer needed. The entire computation is expressed in SQL and can be executed without switching back and forth between the SQL executor and PL/SQL interpreter. This solves the impedance mismatch and eliminates the friction during execution. It also allows the database system to optimize the computation as a whole. Since the computation is expressed in SQL, it can be executed on any system that supports contemporary SQL. This can be used to bring UDFs to systems that lack native support. We show how this compilation approach can be used to improve performance, and the planner’s optimization capabilities. We present a collection of 18 UDFs that we use to evaluate our approach. In our experiments, we show that our approach can improve performance by up to 3 times. We also show cases where our approach does not improve, or even degrades performance. We discuss the reasons for this and show how to avoid these pitfalls.
The second part of this thesis focuses on recursive UDFs. Performance of such recursive UDFs is often disastrous, because database systems are not optimized for these kinds of computations. We use PostgreSQL as an example to show why recursive UDFs are slow, and why the system spends most of the runtime on parsing and planning. This repeated overhead penalizes every function call at runtime, rendering programming with recursive UDFs largely impractical. We propose to take recursive UDFs for what they are: functions. Using tried and tested techniques from the field of programming languages, we show how to compile recursive UDFs to pure SQL queries. We reuse large parts of the compilation pipeline from the first part of this thesis, and show that it can be used to compile recursive UDFs as well. Trampolined style will again be the vehicle for expressing the computation in SQL. We use a set of 10 recursive UDFs to evaluate our approach. Our experiments show that these UDFs typically suffer from over 90% overhead. Compilation eliminates this overhead and improves performance by up to 180 times. This shows that even though functions are not first-class in SQL, they can still be efficient.
The significance of this thesis lies in its identification of trampolined-style SQL as a powerful
tool for expressing computations in SQL. It shows that SQL is expressive enough for arbitrarily
complex computations, and that it achieves excellent performance. The detailed set of rules
presented in this thesis can be used to implement a compiler for UDFs in any database system
that supports LATERAL
joins and recursive CTEs.