
XQuery Rewrite Optimization in IBM
R
DB2
R
∗
pureXML
TM
Fatma
¨
Ozcan
IBM Almaden Research Center
650 Harry Road, San Jose
Normen Seemann
IBM Silicon Valley Lab
555 Bailey Road, San Jose
Ling Wang
IBM Silicon Valley Lab
555 Bailey Road, San Jose
Abstract
In this paper, we describe XQuery compilation and rewrite optimization in DB2 pureXML, a hybrid
relational and XML database management system. DB2 pureXML has been designed to scale to large
collections of XML data. In such a system, effective filtering of XML documents and efficient execution
of XML navigation are vital for high throughput. Hence the focus of rewrite optimization is to consoli-
date navigation constructs as much as possible and to pushdown comparison predicates and navigation
constructs into data access to enable index usage. In this paper, we describe the new rewrite transfor-
mations we have implemented specifically for XQuery and its navigational constructs. We also briefly
discuss how some of the existing rewrite transformations developed for the SQL engine are extended and
adapted for XQuery.
1 Introduction
XML has emerged in the industry as the predominant mechanism for representing and exchanging structured
and semi-structured information across the Internet, between applications, and within an intranet. Key benefits
of XML are its vendor and platform independence and its high flexibility. With the proliferation of XML data,
several XML management systems [7, 10, 17, 5, 4, 6, 12, 11, 14] have been developed over the last couple of
years. All major database vendors have released XML extensions to their relational engines, in addition to many
native XML management systems. XQuery [18] and SQL/XML [9] are the two industry-standard languages
that are supported by these systems to query XML. Most of the current research now focuses on optimization of
XQuery and SQL/XML in these XML management systems.
In this paper, we describe XQuery rewrite optimization within the context of
DB2 pureXML
[4], which is a
hybrid relational and XML database engine that provides native XML storage, indexing, navigation and query
processing through both SQL/XML [9] and XQuery [18], using the XML data type introduced by SQL/XML.
DB2 pureXML
stores XML data in columns of relational tables, as instances of the XQuery data model [19]
in a structured type-annotated tree. By storing binary representation of type-annotated trees,
DB2 pureXML
avoids repeated parsing and validation of documents.
DB2 pureXML
[4] query evaluation run-time contains
three major components for XML query processing: (1) XML navigation engine, (2) XML index run-time and
(3) the XQuery function library. Additionally, several relational runtime operators have been extended to deal
Copyright 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
copyrighted component of this work in other works must be obtained from the IEEE.
Bulletin of the IEEE Computer Society Technical Committee on Data Engineering
∗
DB2 pureXML is a a trademark or registered trademark of International Business Machines Corporation.
1
评论