The Nautilus Analyzer: Understanding and Debugging Data Transformations

Melanie Herschel • Hanno Eichelberber

Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), Maui, Hawaii, USA, November 2012.

When developing data transformations - a task omnipresent in applications like data integration, data migration, data cleaning, or scientific data processing - developers quickly face the need to verify the semantic correctness of the transformation. Declarative specifications of data transformations, e.g., SQL or ETL tools, increase developer productivity but usually provide limited or no means for inspection or debugging. In this situation, developers today have no choice but to manually analyze the transformation and, in case of an error, to (repeatedly) fix and test the transformation.

The goal of the Nautilus project is to semi-automatically support this analysis-fix-test cycle. This demonstration focuses on one main component of Nautilus, namely the Nautilus Analyzer that helps developers in understanding and debugging their data transformations. The demonstration will show the capabilities of this component for data transformations specified in SQL on scenarios from different domains that are based on real-world data.

We provide an overview of the Nautilus Analyzer, discuss components and implementation techniques, and outline our demonstration plan. The Nautilus website (http://nautilus-system.org) features a video, screenshots, and further details.