Why Off-the-Shelf RDBMSs are Better at XPath Than You Might Expect

Torsten Grust • Jan Rittinger • Jens Teubner

Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD 2007), Beijing, China, June 2007.

To compensate for the inherent impedance mismatch between the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational databases, most notably the TwigStack [6], structural join [1], and staircase join [13] algorithms. However, the addition of these algorithms to existing systems depends on a significant invasion of the underlying database kernel, an option intolerable for most database vendors.

Here, we demonstrate that we can achieve comparable XPath performance without touching the heart of the system. We carefully exploit existing database functionality and accelerate XPath navigation by purely relational means: partitioned B-trees bring access costs to secondary storage to a minimum, while aggregation functions avoid an expensive computation and removal of duplicate result nodes to comply with the XPath semantics. Experiments carried out on IBM DB2 confirm that our approach can turn off-the-shelf database systems into efficient XPath processors.