Bringing Row Pattern Matching to DuckDB: NFA construction
MATCH_RECOGNIZE is a relatively recent addition to SQL, designed for performing row
pattern matching using regular expressions across rows in a database table. How-
ever, due to limited support across many database management systems (DBMSS),
this functionality is not widely available. To address this gap, this thesis presents a
transpiler that translates MATCH_RECOGNIZE statements into semantically equivalent
SQL code using only WITH RECURSIVE and WINDOW functions.
A core component of this approach is the construction of a transition table
from the regular expression defined in the PATTERN clause. This process leverages
Glushkov’s construction algorithm to generate a non-deterministic finite automa-
ton (NFA) directly from the syntax tree of the expression. To ensure deterministic
evaluation suitable for SQL queries, a transition penalty system is introduced. This
mechanism assigns a penalty value to each transition, enabling the resolution of
ambiguities and the transformation of the NFA into a deterministic finite automaton
(DFA).
The resulting transition table serves as the foundation for the recursive SQL query,
enabling portable pattern matching even in systems without native support for
MATCH_RECOGNIZE.