Database Systems

Bringing Row Pattern Matching to DuckDB: NFA construction

MATCH_RECOGNIZE is a relatively recent addition to SQL, designed for performing row pattern matching using regular expressions across rows in a database table. How- ever, due to limited support across many database management systems (DBMSS), this functionality is not widely available. To address this gap, this thesis presents a transpiler that translates MATCH_RECOGNIZE statements into semantically equivalent SQL code using only WITH RECURSIVE and WINDOW functions.

A core component of this approach is the construction of a transition table from the regular expression defined in the PATTERN clause. This process leverages Glushkov’s construction algorithm to generate a non-deterministic finite automa- ton (NFA) directly from the syntax tree of the expression. To ensure deterministic evaluation suitable for SQL queries, a transition penalty system is introduced. This mechanism assigns a penalty value to each transition, enabling the resolution of ambiguities and the transformation of the NFA into a deterministic finite automaton (DFA).

The resulting transition table serves as the foundation for the recursive SQL query, enabling portable pattern matching even in systems without native support for MATCH_RECOGNIZE.

Contact

Louisa Flüchter née Lambrecht