Database Systems

Bringing Row Pattern Matching to DuckDB: Parsing and Frontend

Row pattern matching in the form of MATCH_RECOGNIZE, a technique that allows finding sequences of rows matching a regular expression-defined pattern, is currently not supported by a number of database management systems (DBMS), such as PostgreSQL and DuckDB. This thesis is part of a research project focused on developing a transpiler that converts SQL containing MATCH_RECOGNIZE into standard SQL. In this context, we address two main parts: The parsing of MATCH_RECOGNIZE clauses and the creation of a web front end for demonstrating the transpiler.

For the parsing, we implement a workflow that uses the ANTLR parser generator, finding it to be a suitable solution for our purpose and greatly reducing the implementation effort. After analyzing the syntax of MATCH_RECOGNIZE, we cover the creation of a grammar and the adaptation of a visitor, generated by ANTLR, for the construction of an abstract syntax tree following a specific class structure. Through a test setup involving syntax-exhaustive example queries, we validate that our workflow is functioning as intended and effectively parses MATCH_RECOGNIZE clauses, providing a basis for further transpilation.

As a frontend solution, we modify the existing web application Compiler Explorer. We cover the integration of the MATCH_RECOGNIZE transpiler into the backend and setting up a mechanism for passing pre-executed SQL statements. We add functionality for displaying intermediate representations as an image. To be able to directly execute transpiled statements, we integrate a WebAssembly version of the DuckDB DBMS into the frontend, including a pane containing the web shell of DuckDB. Special attention is hereby given to handling technical challenges concerning query submission and asynchronous behavior. Finally, the application is containerized using Docker. We find that the chosen workflow is a stable and comfortable solution for demonstrating the MATCH_RECOGNIZE transpiler, but requires implementation effort due to Compiler Explorer’s extensive code base and complex technology stack.

Contact

Louisa Flüchter née Lambrecht