Bringing Row Pattern Matching to DuckDB: Parsing and Frontend
Row pattern matching in the form of MATCH_RECOGNIZE, a technique that allows finding
sequences of rows matching a regular expression-defined pattern, is currently not
supported by a number of database management systems (DBMS), such as PostgreSQL
and DuckDB. This thesis is part of a research project focused on developing a transpiler
that converts SQL containing MATCH_RECOGNIZE into standard SQL. In this context, we
address two main parts: The parsing of MATCH_RECOGNIZE clauses and the creation of
a web front end for demonstrating the transpiler.
For the parsing, we implement a workflow that uses the ANTLR parser generator, finding
it to be a suitable solution for our purpose and greatly reducing the implementation
effort. After analyzing the syntax of MATCH_RECOGNIZE, we cover the creation of a
grammar and the adaptation of a visitor, generated by ANTLR, for the construction of an
abstract syntax tree following a specific class structure. Through a test setup involving
syntax-exhaustive example queries, we validate that our workflow is functioning as
intended and effectively parses MATCH_RECOGNIZE clauses, providing a basis for further
transpilation.
As a frontend solution, we modify the existing web application Compiler Explorer. We
cover the integration of the MATCH_RECOGNIZE transpiler into the backend and setting
up a mechanism for passing pre-executed SQL statements. We add functionality for
displaying intermediate representations as an image. To be able to directly execute
transpiled statements, we integrate a WebAssembly version of the DuckDB DBMS into
the frontend, including a pane containing the web shell of DuckDB. Special attention
is hereby given to handling technical challenges concerning query submission and
asynchronous behavior. Finally, the application is containerized using Docker. We find
that the chosen workflow is a stable and comfortable solution for demonstrating
the MATCH_RECOGNIZE transpiler, but requires implementation effort due to Compiler
Explorer’s extensive code base and complex technology stack.