Snakes on a Plan — Compiling Python Functions into Plain SQL Queries

Denis HirnTorsten GrustTim Fischer

Proceedings of the 41st ACM SIGMOD Int’l Conference on Management of Data (SIGMOD 2022), Philadelphia, PA, USA, June 2022. doi:10.1145/3514221.3520175

BibTeX Citation
@inproceedings{snakes-on-a-plan,
  author    = {Fischer, Tim and Hirn, Denis and Grust, Torsten},
  title     = {Snakes on a Plan: Compiling Python Functions into Plain SQL Queries},
  year      = {2022},
  isbn      = {9781450392495},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  url       = {https://doi.org/10.1145/3514221.3520175},
  doi       = {10.1145/3514221.3520175},
  booktitle = {Proceedings of the 2022 International Conference on Management of Data},
  pages     = {2389–2392},
  numpages  = {4},
  keywords  = {SQL, compilation, python, recursion, user-defined functions},
  location  = {Philadelphia, PA, USA},
  series    = {SIGMOD '22}
}

“Move your computation close to the data” is decades-old advice that is hard to follow if your code exhibits complex control flow. The runtime of such applications suffers from a continual back and forth between database-external code execution and plan-based SQL evaluation. We demonstrate the ByePy compiler which translates entire Python functions with arbitrary control flow—including deeply nested iteration—into plain recursive SQL:1999 queries. The invocation of a ByePy-compiled function enters the database engine once to execute the plan of a single query. Computation does not get much closer to the data than this. The system rewards this translation effort from Python to SQL with runtime improvements of up to an order of magnitude.

A sample of Python UDFs as well as their recursive SQL equivalents can be found on GitHub at https://github.com/ByePy/examples.