Quilt: an XML Query Language

31 March 2000

Authors:
Jonathan Robie (Software AG)
     <Jonathan.Robie@SoftwareAG-USA.com>
Don Chamberlin (IBM Research - Almaden)
     <chamberlin@almaden.ibm.com >
Daniela Florescu (INRIA)
     <Daniela.Florescu@inria.fr >

Abstract

XML is an extremely versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language of similar versatility is needed to realize the potential of XML as a universal medium for data interchange. Most existing proposals for XML query languages are robust for particular types of data sources but weak for other types. In this paper, the authors combine features from several sources to propose a new query language called Quilt, which is designed to be broadly applicable across all types of XML data sources.

Table of contents

1 Introduction
2 A First Look at Quilt
3 The Quilt Language
    3.1 Syntax
    3.2 FOR
    3.3 LET
    3.4 WHERE
    3.5 RETURN
    3.6 Filtering Documents
4 Querying Relational Data
5 Querying Structured Documents
6 Exploiting References in Documents
7 Conclusion
8 References


1 Introduction

"Mediocre composers borrow; great composers steal." -- Igor Stravinsky

As increasing amounts of information are stored in XML, exchanged in XML, and presented as XML through various interfaces, the ability to intelligently query XML data sources becomes increasingly important. The data model of XML is quite different from the data models of traditional databases, and traditional database languages are not well suited to querying XML sources. In addition, XML blurs the distinction between data and documents, allowing documents to be treated as data sources and traditional data sources to be treated as documents. Query languages, including XML query languages, still tend to be designed either for documents or for data. Since XML may represent a rich variety of information coming from many sources and structured in many ways, an XML query language must try to provide the same rich variety in the queries that may be formulated.

Quilt is a query language for XML. It originated when the authors attempted to apply XML query languages such as XML-QL, XPath, XQL, YATL, and XSQL to a variety of use cases. We found that each of these languages had strong advantages for some of the queries we examined, but was unable to express other queries we found equally important. Therefore, we chose to take some of the best ideas from these languages, plus some ideas from SQL and OQL, integrating them with a fresh syntactic approach. Our goal was to design a small, implementable language that met the requirements specified in the W3C XML Query Working Group's XML Query Requirements. During our design work, we have adapted features from various languages, carefully assembling them to form a new design--hence the name "Quilt". The resulting language supports queries that draw information from various sources and patch them together in a new form. This is another reason for the name "Quilt".

Although Quilt borrows from many languages, its conceptual integrity comes from a deep reliance on the structure of XML, which is based on hierarchy, sequence, and reference. Quilt is able to express queries based on document structure and to produce query results that either preserve the original document structure or generate a new structure. Quilt can operate on flat structures, such as rows from relational databases, and generate hierarchies based on the information contained in these structures. It can also express queries based on parent/child relationships or document sequence, and can preserve these relationships or generate new ones in the output document. Although Quilt can be used to retrieve data from objects, relational databases, or other non-XML sources, this data must be expressed in an XML view, and Quilt relies solely on the structure of the XML view in its data model.

We begin by describing the Quilt language, and then illustrate how the language may be used for a variety of queries, including queries typically used for relational data, semi-structured data, and structured documents.

2 A First Look at Quilt

Two languages that were particularly influential in the design of Quilt were XML-QL and XQL. We will begin our description of Quilt by discussing some of the concepts it inherited from these two languages. In the discussion that follows, we conceptualize the nested elements of an XML document as nodes in a tree-like data model. Sometimes we will speak in terms of the elements and attributes of the XML document, and sometimes in terms of the nodes of the tree that models the document.

The core of an XML-QL query consists of a WHERE clause that binds one or more variables, and a CONSTRUCT clause that uses the values of the bound variables to construct a new document, possibly having a structure quite different from that of the original document(s). This is a very powerful concept because it allows documents to be joined, grouped, and transformed in arbitrary ways. However, the patterns that are used in XML-QL for binding variables tend to be unnecessarily verbose. More important, the result of the WHERE clause is a relation composed of scalar values, which is not sufficient to preserve all the information about the hierarchic and sequential relationships among the elements of the original document. As a result, XML-QL is a very powerful language for asking certain types of questions, but is weak at expressing queries based on hierarchy and sequence, such as "Find the second quotation in the fourth chapter." From XML-QL, Quilt borrows the concept of constructing a new document from bound variables, but Quilt uses a different paradigm for binding the variables and a richer data model to represent the result of the binding.

XQL-98 contains a powerful syntax for navigating in a hierarchic document and for selecting a set of nodes that satisfy some complex condition. This syntax is similar to the abbreviated syntax of XPath, which has been adopted as a W3C Recommendation and has been used in various XML-related applications such as XPointer and XSLT. For selecting sets of nodes, Quilt uses abbreviated XPath expressions, enhanced by certain operators borrowed from XQL. Since an XPath expression, in general, returns a "forest" of nodes in which hierarchy and sequence are preserved, Quilt is able to express queries based on hierarchy and sequence more easily than languages such as XML-QL.

In this section, we will introduce the Quilt language using some example queries based on the following XML document:

<?xml version="1.0"?>
<!DOCTYPE bib SYSTEM "books.dtd">
<bib> 
  <book> 
     <title>Harold and the Purple Crayon</title> 
     <author> 
        <lastname>Johnson</last> 
        <firstname>Crockett</first> 
     </author>
     <pubinfo> 
       <publisher>Harper and Row</publisher> 
       <price>$4.76</price> 
       <year>1955</year>
     </pubinfo>
  </book> 
  <book> 
     <title>Harold's Fairy Tale</title> 
     <author> 
        <lastname>Johnson</last> 
        <firstname>Crockett</first> 
     </author>
     <pubinfo>
       <publisher>Harper and Row</publisher> 
       <price>$4.76</price>
       <year>1956</year>
     </pubinfo>
  </book> 
  <book> 
     <title>Rise Up Singing</title> 
     <author> 
        <lastname>Blood</last> 
        <firstname>Peter</first> 
     </author>
     <author> 
        <lastname>Patterson</last> 
        <firstname>Annie</first> 
     </author>
     <pubinfo>
       <publisher>Sing Out Corporation</publisher> 
       <price>$15.45</price>
       <year>1988</year>
     </pubinfo>
  </book> 
</bib>

A simple form of a Quilt query consists of FOR, WHERE, and RETURN clauses. The FOR clause uses XPath expressions to bind the values of one or more variables. In general, an XPath expression evaluates to a set of nodes. The FOR clause generates an ordered list of tuples, each containing of a value for each of the bound variables. A tuple is generated for each possible way of binding the list of variables to nodes that satisfy their respective XPath expressions. When a node is bound to a variable, its descendant nodes are carried along with it. The WHERE clause applies a filter to the tuples, retaining only those tuples that satisfy a given search condition. The RETURN clause then generates a new document structure using the values of the bound variables.

The following example finds every book written by Crockett Johnson. The FOR clause generates a list of bindings in which $b is bound to individual book elements in the document found at the given URL, and $a is bound to individual author elements that are descendants of $b. The WHERE clause retains only those tuples of bindings in which the author is Crockett Johnson, and the RETURN clause uses the resulting values of $b to generate a list of books. By default, the ordering of book elements in the original document is preserved.

FOR    $b IN document("http://www.biblio.com/books.xml")//book,
       $a IN $b/author
WHERE  $a/firstname = "Crockett" AND  $a/lastname = "Johnson"
RETURN $b

The semantics of comparisons in Quilt is the same as in XPath. For example, in the query above, consider the comparison $a/lastname = "Johnson". In general, an XPath expression such as $a/lastname evaluates to a set of nodes. The comparison is considered to be True if at least one of the nodes returned by $a/lastname has a string-value equal to "Johnson".

The query shown above returns a list of book elements without a common root element to hold them. Since a well-formed XML document must have precisely one root element, we may desire to generate a new element to enclose the results of our query. The easiest way to do this is to embed the entire query within an element constructor, as illustrated in the following query:

<Result>
    (
    FOR    $b IN document("http://www.biblio.com/books.xml")//book,
           $a IN $b/author
    WHERE  $a/firstname = "Crockett"
      AND  $a/lastname = "Johnson"
    RETURN $b
    )
</Result>

In the above queries, the result consists of selected nodes from the original document. Each selected node carries its descendants with it to the result document, preserving part of the structure of the original document. Many queries, however, need to create an output document that is structured in a new way. One example of such a transformation is a query that inverts the hierarchy of the input data, converting it from authors nested inside books to books nested inside authors. This inversion can be accomplished by the following query:

<Result>
    (
    FOR $a IN DISTINCT document("http://www.biblio.com/books.xml")//author
    RETURN 
      <BooksByAuthor>
         <Author> 
             $a/lastname/text()
         </Author>
         (
         FOR $b IN document("http://www.biblio.com/books.xml")//book[author=$a]
         RETURN $b/title SORTBY(title)
         )
      </BooksByAuthor> SORTBY(Author)
    )
</Result>

The result of this query on the sample data shown above is as follows:

<Result>
    <BooksByAuthor>
        <Author>Blood</Author>
        <title>Rise Up Singing</title> 
    </BooksByAuthor>
    <BooksByAuthor>
        <Author>Johnson</Author>
        <title>Harold and the Purple Crayon</title>
        <title>Harold's Fairy Tale</title> 
    </BooksByAuthor>
    <BooksByAuthor>
        <Author>Patterson</Author>
        <title>Rise Up Singing</title> 
    </BooksByAuthor>
</Result>

The following points are worth noting about the above example:

  1. This example illustrates how a query can be nested inside another query. The variable $a, defined in the outer query, ranges over the set of distinct authors in the input document. The variable $b, defined in the inner query, ranges over the individual books written by a given author.
  2. The data contained in the "Author" elements of the output document is selected from the "lastname" elements of the input document. The contents of these elements are extracted using XPath expressions that use the text() function, and then enclosed in new tags in the RETURN clause.
  3. The order of elements in the output document is controlled by SORTBY clauses in both the inner and outer queries.
  4. The predicate [author = $a] involves a comparison of complex types. Quill compares elements that have substructure by normalizing whitespace and testing for the same content and structure, in the same sequence. One concrete way to do this is by serializing content and markup. For example, the normalized string value of this author element:
    <author> 
        <last>Johnson</last> 
        <first>Crockett</first> 
    </author>
    
    is formed by serializing the markup, removing leading and trailing whitespace for each element, and removing whitespace between tags, as follows:
    <author><last>Johnson</last><first>Crockett</first></author>
    
    Once a complex element has been converted to a normalized string form, it can be compared to the normalized string value of another complex element.

The queries discussed above explicitly state the URL of the document to be queried. This is not always necessary, and is sometimes undesirable. Some queries operate on a set of nodes for which there is no URL, or for which it is not convenient to provide a URL; for instance, a query may operate on the set of documents found in a web site or document repository, the set of nodes found in a collection used in some programming language, or the results of a previous query. Furthermore, it is useful to be able to write queries that may be applied to a variety of data sources, including data sources that do not exist at the time that the query is written. If the document() function does not occur in a variable binding, the binding is applied to the set of input nodes for the query, which is implicitly determined by the environment in which the query is executed. The following query is equivalent to the first query shown in this paper, except that it binds the variable $b to all book elements contained within the implicit set of input nodes for the query:

FOR    $b IN //book,
       $a IN $b/author
WHERE  $a/first = "Crockett"
  AND  $a/last = "Johnson"
RETURN $b

At this point, we have seen the basic structure of a Quilt query. The next section describes the syntax of Quilt in more detail.

3 The Quilt Language

This section discusses the structure and meaning of a Quilt query, starting with top-level syntax and then examining each clause in detail.

3.1 Syntax

The following syntax shows the top-level structure of a Quilt query. The grammar is still being developed as the language evolves, and is incomplete in the current version of this document. Terminal symbols are shown in angular brackets and their lexical structure is not further specified.

  Quilt Grammar 
Query ::= Function_defn*   Expression 
Function_defn ::= 'FUNCTION' <Function_name>  '('  <Variable> *  ')'  '{'  Expression '}' 
Expression ::= <Variable>
|   <Constant>
|   <Function_name> '(' ExpressionList ')' 
|   Expression Operator Expression
|   <XPathExpression>
|   ElementConstructor
|   FLWR_Expression
|   '('  Expression  ')'
ExpressionList ::=  Expression 
|   Expression ',' Expression
ElementConstructor ::= StartTag   ExpressionList   EndTag
StartTag ::= '<' <String>   Attribute* '>' 
EndTag ::= '</' <String>  '>'
Attribute ::= <String> '=' Expression
|   Expression
FLWR_Expression ::= (FOR_clause | LET_clause)+   WHERE_clause?   RETURN_clause
FOR_clause ::=  'FOR'   FOR_binding   (,   FOR_binding)*
FOR_binding ::=  <Variable>  'IN'   'DISTINCT'?    Expression
LET_clause ::= 'LET'  LET_binding   (,   LET_binding)*
LET_binding ::= <Variable>  ':='   'DISTINCT'?    Expression
WHERE_clause ::= 'WHERE' Expression
RETURN_clause ::= 'RETURN'  Expression   SORTBY_clause?
SORTBY_clause ::= 'SORTBY' '(' ExpressionList ')'
     ( 'ASCENDING' | 'DESCENDING' )?
Operator ::= '<' | '<=' | '>' | '>=' | '=' | '!=' | '+' | ....
 

A Quilt query can begin with the definition of one or more functions that are used in the body of the query. The body of the query is simply an expression. One important type of query is based on an "FLWR_expression", which is constructed from FOR, LET, WHERE, and RETURN clauses. But a query may also be based on other types of expressions, such as an XPath expression or a element constructor. Element constructors are often used to generate new output elements that contain data computed by the query.

In this grammar, XPathExpression is a terminal symbol. This symbol represents an expression based on the abbreviated syntax of XPath, enhanced by the following operators which are borrowed from XQL '99:

  1. The BEFORE and AFTER operators. "a BEFORE b" evaluates to the set of nodes in 'a' that occur before some node in 'b'. "a AFTER b" evaluates the set of nodes in 'a' that occur after some node in 'b'. In the section "Querying Structured Documents, we will see how these operators are used for expressing conditions based on sequence.

  2. The INTERSECT operator. "a INTERSECT b" evaluates to the set of nodes in 'a' that are also found in 'b'. This is an enhancement to XPath, which contains an operator for set union but not for intersection.

  3. The EXCEPT operator. "a EXCEPT b" evaluates to the set of nodes in 'a' that are not also found in 'b'.

We also use the document() function, which is taken from XSLT. In this paper, the argument of the document() function is always a URL. When used in this way, the document() function returns the root node of the document. Some other functions are also used in the example queries, and are introduced where they are used. There is not yet a well-defined standard function library for Quilt.

The overall flow of information through an FLWR_expression is illustrated in Figure 1. The input to the expression consists of one or more XML documents or fragments that are named in the FOR and/or LET clauses. Each of these fragments can be conceptualized as an "ordered forest"--that is, an ordered list of nodes representing XML elements, each with its tree of descendant nodes. The result of the FOR and LET clauses is an ordered list of tuples, each containing a value for each of the bound variables. The value of a variable bound by a FOR clause is a tree (a single node and its descendants). The value of a variable bound by a LET clause is a (possibly empty) ordered forest of nodes. The WHERE clause serves as a filter that discards some of the tuples and retains others. Finally, the RETURN clause is executed for each surviving tuple, generating output nodes from the values of the bound variables. The nodes generated by the RETURN clause can be linearized into an output XML document or fragment.



Data flows FROM to LET to WHERE to RETURN
Figure 1: Flow of data in an FLWR expression


3.2 FOR

The FOR clause generates bindings for one or more variables. Each variable introduced in the FOR clause is associated with an expression (typically, an XPath expression.) In general, each of these expressions returns a set of nodes. The result of the FOR clause is a set of tuples, each of which contains a binding for each of the variables. The variables are bound to individual nodes in the sets returned by their respective expressions, together with their descendants. The number of tuples generated by a FOR clause is the product of the cardinalities of the node-sets returned by the respective expressions. The implied ordering among the tuples generated by the FOR clause is derived from the ordering of their elements in the input document, with the first bound variable taking precedence, followed by the second bound variable, etc.

In a future version of the Quilt grammar, we plan to provide a way in which the ordering of the tuples generated by a FOR clause can be relaxed, in order to permit more efficient processing of queries in which order is not important.

The DISTINCT keyword can be applied independently to each expression in a FOR clause, serving to eliminate duplicate values from the node-sets returned by the expression. Equality is defined by equality of value rather than by identity. A node set generated using the keyword DISTINCT is unordered, and the tuples of bindings generated from such a node set are unordered also. When DISTINCT is specified and several candidate nodes of equal value are available for binding, Quilt does not specify which of the candidate nodes is bound to the variable.

The following queries illustrate use of the DISTINCT keyword:

FOR $p IN //publisher
RETURN $p

Result:

<publisher>Harper and Row</publisher> 
<publisher>Harper and Row</publisher> 
<publisher>Sing Out Corporation</publisher> 

FOR $p IN DISTINCT //publisher
RETURN $p

Result (unordered):

<publisher>Sing Out Corporation</publisher> 
<publisher>Harper and Row</publisher> 

The tuples generated by a FOR clause can be used to construct tabular representations of hierarchical data, such as might be used in HTML tables or relational databases. The following example query creates part of an XHTML document in which names of authors and book titles appear in separate columns of a table:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
     <title> Books By Author </title>
  </head>
  <body>
    <p> 
        The following table lists authors and books written
        by each author. 
    </p>
    <table width="50%" border="1">
       (
       FOR $a IN DISTINCT //author ,
           $b IN //book[author = $a]
       RETURN 
         <tr> 
           <td>
               $a/first/text()
               $a/last/text()
           </td>
           <td>
               $b/title/text()
           </td>
         </tr>
       )
    </table>
  </body>
</html>

The table generated by the above query might have the following appearance:

Crockett Johnson Harold and the Purple Crayon
Crockett Johnson Harold's Fairy Tale
Peter Blood Rise Up Singing
Annie Patterson Rise Up Singing

3.3 LET

The LET clause binds a variable to the value of an expression (typically an XPath expression.) Since an expression in general returns an ordered list of nodes, the value of a variable bound in a LET clause is an ordered list of nodes, together with their descendants (an "ordered forest"). The value bound by the LET clause preserves the ordering of nodes in the input document, unless DISTINCT is specified. DISTINCT causes duplicate nodes to be eliminated and the ordering of the nodes to become undefined.

The FOR and LET clauses work together to generate tuples of variable bindings. Unlike a FOR clause, however, a LET clause does not affect the number of tuples that are generated. Each LET clause binds its variable to exactly one value (an "ordered forest"). If a query contains a LET clause but no FOR clause, exactly one tuple of variable bindings is generated.

Each binding in a FOR or LET clause may contain references to variables that have already been bound in earlier clauses or in an outer-level query.

A LET clause is often used to bind a variable to a set of values that is used as the argument of some aggregate function such as avg(). For example, the following query returns the average price of all the books in the input document:

LET $b := //book
RETURN 
   <avgprice> avg($b/price) </avgprice>

A LET clause can be combined with a FOR clause to generate several sets and to perform some computation on each. The following example uses variable $pub to iterate over all the publishers in the input document, binds variable $pri to the set of prices of books published by each publisher, and then generates an output document that lists each publisher with its average book price.

FOR $pub IN DISTINCT //publisher
LET $pri := document("http://www.biblio.com/books.xml")
              //book/pubinfo[publisher=$pub]/price
RETURN 
   <pubinfo>
      $pub ,
      <avgprice> avg($pri) </avgprice>
   </pubinfo>

3.4 WHERE

The FOR and LET clauses generate tuples of bindings for the variables that are introduced in these clauses. Each of these tuples is subject to further filtering by the WHERE clause. Only those tuples of bindings for which the condition in the WHERE clause is true are used to invoke the RETURN clause.

In the WHERE clause, predicates may be combined using parentheses, AND, OR, and NOT. Predicates are based on XPath expressions that contain the variables bound in the FOR and LET clauses. The WHERE clause may also use several operators taken from XQL: set intersection is expressed with the INTERSECT keyword, sequence is expressed with the BEFORE and AFTER operators, and set difference is expressed using the EXCEPT operator. These operators will be illustrated by example queries throughout the remainder of this paper.

In the following example, the WHERE clause uses a built-in function named exists(), which returns False if its argument is the empty set, and otherwise returns True. The query finds books that have titles but no authors.

<orphan_books>
   (
   FOR $b IN //book
   WHERE exists($b/title)
   AND NOT exists($b/author)
   RETURN $b/title
   )
</orphan_books>

Remember that variables bound in a FOR clause are bound to individual nodes (with their descendants), but variables bound in a LET clause are bound to ordered sets of nodes (with their descendants). In the WHERE clause, appropriate predicates must be used with each type of variable. For example, in the following query, $bk is bound to a set of books, and the WHERE clause appropriately applies a count() function to count the number of books in the set. The query returns publishers who have published more than 100 books.

FOR $pub IN DISTINCT //publisher
LET $bk := //book[pubinfo/publisher=$pub]
WHERE count($bk) > 100
RETURN $pub

If it were desired to add an additional condition on books, such as "find publishers who published more than 100 books in 1999", this condition could not be added to the WHERE clause, since the WHERE clause has access only to sets of books, not to individual books. The proper place to add such a condition would be in the XPath expression that defines $bk, as follows:

FOR $pub IN DISTINCT //publisher
LET $bk := //book[pubinfo/publisher=$pub AND pubinfo/year="1999"]
WHERE count($bk) > 100
RETURN $pub

3.5 RETURN

The RETURN clause generates the output of the FLWR-expression, which may be a node, an "ordered forest" of nodes, or a primitive value. The RETURN clause is invoked once for each tuple of variable bindings that is generated by the FOR and LET clauses and satisfies the condition in the WHERE clause. If an ordering exists among the tuples of variable bindings, the RETURN clause is invoked on each tuple, in order, and the order is preserved in the output document. The RETURN clause contains an expression that may include structured XML text, bound variables and XPath expressions based on these variables, and subqueries. We have already seen many examples of RETURN clauses, and we will see more in the queries that follow.

It is often important to specify an order for the elements in a query result, supplementing or superceding the ordering derived from the bindings of variables. If the query result contains several levels of elements, an ordering may be required among the elements at each level. Quilt provides a SORTBY clause that may be used after a constructed element, variable, or path expression to specify an ordering among the generated elements. The arguments of the SORTBY clause are evaluated relative to the individual nodes to be sorted, and may be followed by ASCENDING or DESCENDING to specify the direction of the sort (ASCENDING is the default).

The following query illustrates some of the features of SORTBY. It generates an alphabetical list of publishers. Within each publisher it generates an alphabetical list of book titles, followed by a list of authors ordered by their last and first names. The notation "SORTBY(.)" indicates that the generated elements are to be ordered by their content.

<result>
   (
   FOR $p IN DISTINCT //publisher
   RETURN
      <publisher>
         <name>
            $p/text()
         </name> ,
         <books>
            (
            FOR $t IN DISTINCT //book[pubinfo/publisher = $p]/title
            RETURN $t SORTBY(.)
            )
         </books> ,
         <authors>
            (
            FOR $a IN DISTINCT //book[pubinfo/publisher = $p]/author
            RETURN $a SORTBY(lastname, firstname)
            )
         </authors>
      </publisher> SORTBY(name)
   )
</result>     

3.6 Filtering Documents

An important class of queries extracts selected elements from a document while preserving the relationships among the selected elements. In many cases, both hierarchic and sequential relationships must be preserved. This type of query can be expressed in Quilt by using the keyword FILTER in a LET clause.

As we have seen, a LET clause binds a variable to an ordered set of nodes that are selected by an XPath expression. We will refer to these nodes as the "selected nodes". In general, each of the selected nodes carries with it all its descendant nodes, so we may think of the selected nodes as a list of trees (node hierarchies). A LET clause may also contain the keyword FILTER followed by a second XPath expression called the filter. Each of the subtrees in the selected list is "filtered" in such a way that only those nodes are retained that satisfy the filter XPath expression. Descendants of nodes that satisfy the filter are not retained unless they independently satisfy the filter. The hierarchical and sequential relationships among the surviving nodes are preserved (for example, if node Y is a descendant of node X and all the intervening nodes between X and Y fail to satisfy the filter, then Y becomes an immediate descendant of X.)

If the right-hand side of a LET clause contains only the FILTER keyword followed by a filter expression, the input to the filter is the implicit input document of the query.

Figure 2 shows the effect of applying a filter to a tree of nodes. In this case, the filter retains only nodes of type A and B, causing the original tree to split into two trees.



Illustrates action of filter function
Figure 2: Effects of filtering on a tree of nodes


We will illustrate the use of FILTER by writing some queries that operate on a document named "cookbook.xml". The document contains many different kinds of elements, including Section elements that may be nested inside each other to many levels of nesting. Each Section element immediately contains a Title element. Mixed among the other elements, at various levels of nesting, are Figure elements, which may also contain Title elements. The following fragment illustrates the structure of the cookbook:

<Section><Title>Desserts</Title>
   <Para>This section tells you all about yummy desserts.</Para>
   <Section><Title>Cookies</Title> 
      <Para>Cookies are flat things that sometimes have chocolate chips.
         <Figure image="cookie1"><Title>A Big Cookie</Title></Figure>
      </Para>
   </Section>
   <Section><Title>Candy</Title> 
      <Para>Candy is dandy.</Para>
   </Section>
   <Section><Title>Pies and Tarts</Title> 
      <Para>Pies and tarts are even better.</Para>
      <Figure image="applepie"><Title>An Apple Pie</Title></Figure>
      <Figure image="peachpie"><Title>A Peach Pie</Title></Figure>
   </Section>
   <Section><Title>Cakes and Cupcakes</Title> 
      <Para>Cakes and Cupcakes are absolutely awesome.</Para>
      <Section><Title>Cakes</Title>
         <Figure image="choc_cake"><Title>A Chocolate Cake</Title></Figure>
      </Section>
      <Section><Title>Cupcakes</Title>
         <Para>Cupcakes are not as big but just as good.</Para>
      </Section>
   </Section>
</Section>

Suppose that we wish to generate a table of contents for the cookbook, in which the hierarchy of nested sections is preserved but only the title of each section is included. We might think of this as a process of "projecting" the sections and section titles out of the document, preserving their original hierarchy and sequence, while eliminating other kinds of elements. The following query could be used for this purpose:

<toc>
  (
    LET $s := document("cookbook.xml") 
                 FILTER //Section | //Section/Title)
    RETURN $s
  )
</toc>

In this example, the LET clause uses FILTER to bind variable $s to a copy of the cookbook document that contains only Section elements and Title elements that are immediately contained within Section elements, preserving the original relationships among these elements. All other elements are removed, even if they are descendants of a selected element. Since $s then contains exactly the elements that we wish to include in the output, the query can simply return $s without any further processing. The result of applying this query to the document fragment above is as follows:

<Section><Title>Desserts</Title>
   <Section><Title>Cookies</Title> 
   </Section>
   <Section><Title>Candy</Title> 
   </Section>
   <Section><Title>Pies and Tarts</Title> 
   </Section>
   <Section><Title>Cakes and Cupcakes</Title>
        <Section><Title>Cakes</Title>
        </Section>
        <Section><Title>Cupcakes</Title>
        </Section>
   </Section>
</Section>

The hierarchy and sequence of the query results are determined by the document, and could not have been known in advance by the person formulating the query; in fact, the purpose of the query is to determine this unknown hierarchy and sequence.

Now suppose that we wish to extend our table of contents to include the number of Figures in each Section. This is a much more complex transformation in the structure of the document, and we will accomplish it by defining a function. As shown in the Quilt grammar, a query can begin by defining a function that takes one or more parameters. In this example, we will define a function named "section_summary" that takes a Section element as its parameter. A function is allowed to call itself recursively, and we will make use of this capability in this example. The "section_summary" function generates a new Section element that contains only the Title of the original Section and the number of Figures that are immediately contained in it; it also invokes itself recursively to process all immediately nested Sections in the same way.

The main part of the query invokes the "section_summary" function on all the top-level Sections of the document. But before doing this, it filters the input document so that it contains only Section, Title, and Figure elements. The filtering step is necessary so that all Figure elements appear to be immediately contained within their respective Sections, even if they were originally nested inside some other element such as a paragraph or a list. By eliminating intervening elements, the FILTER enables each Section to compute an accurate count of its Figures.

FUNCTION section_summary($s)
  {
    <Section>
      $s/Title ,
      (
        LET $f := $s/Figure
        RETURN <Figcount> count($f) </Figcount>
      ) ,
      (
        FOR $ss IN $s/Section
        RETURN section_summary($ss)
      )
  }

<toc>
   LET $stf := document("cookbook.xml")
                  FILTER //Section | //Title | //Figure
   FOR $s IN $stf/Section
   RETURN section_summary($s)
</toc>

The result of applying this query to the document fragment shown above is as follows:

<Section><Title>Desserts</Title>
   <Figcount> 0 </Figcount>
   <Section><Title>Cookies</Title>
      <Figcount> 1 </Figcount>
   </Section>
   <Section><Title>Candy</Title>
      <Figcount> 0 </Figcount>
   </Section>
   <Section><Title>Pies and Tarts</Title> 
      <Figcount> 2 </Figcount>
   </Section>
   <Section><Title>Cakes and Cupcakes</Title>
      <Figcount> 0 </Figcount>
      <Section><Title>Cakes</Title>
         <Figcount> 1 </Figcount>
      </Section>
      <Section><Title>Cupcakes</Title>
         <Figcount> 0 </Figcount>
      </Section>
   </Section>
</Section>

4 Querying Relational Data

Since much of the world's business data is now stored in relational databases, access to relational data is a vitally important application for an XML query language. In this section, we will illustrate the use of Quilt to access relational data by a series of examples based on a very simple database that describes an online auction. The three tables in the auction database are shown below. The USERS table has a row for each user who is registered with the auction, either as a seller or a bidder, with the USERID column as its primary key. The ITEMS table lists all the items that are currently for sale, with the ITEMNO column as primary key and the OFFERED_BY column as a foreign key that matches the USERID of the offering user. The BIDS table lists the bids that have been received for various items, using the USERID and ITEMNO columns as foreign keys to identify the userid of the bidder and the item to which the bid applies.

USERS
USERID NAME RATING
ITEMS
ITEMNO DESCRIPTION OFFERED_BY RESERVE_PRICE
BIDS
USERID ITEMNO BID_AMOUNT BID_DATE

In this section, we will assume that a layer of software running on top of a relational database system presents each table in the form of an XML document. The name of the document element is the same as the name of the table, and each row of the table is represented by a nested element that has the same name as the table with the suffix "_tuple". Inside each row element are nested a series of elements that represent the column-values stored in that row. For example, two rows of the USERS table might be represented by the following XML view:

<users>
  <user_tuple>
    <userid> 1005 </userid>
    <name> Baker </name>
    <rating> B </rating>
  </user_tuple> 
  <user_tuple>
    <userid> 1007 </userid>
    <name> Carter </name>
    <rating> A </rating>
  </user_tuple> 
</users>

The Document Type Definitions for the XML views of the three tables in our online auction database are as follows:

<!DOCTYPE users [
  <!ELEMENT users (user_tuple*)>
  <!ELEMENT user_tuple (userid, name, rating)>
  <!ELEMENT userid (#PCDATA)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT rating (#PCDATA)>
]>

<!DOCTYPE items [
  <!ELEMENT items (item_tuple*)>
  <!ELEMENT item_tuple (itemno, description, offered_by, reserve_price)>
  <!ELEMENT itemno (#PCDATA)>
  <!ELEMENT description (#PCDATA)>
  <!ELEMENT offered_by (#PCDATA)>
  <!ELEMENT reserve_price (#PCDATA)>
]>

<!DOCTYPE bids [
  <!ELEMENT bids (bid_tuple*)>
  <!ELEMENT bid_tuple (userid, itemno, bid_amount, bid_date)>
  <!ELEMENT userid (#PCDATA)>
  <!ELEMENT itemno (#PCDATA)>
  <!ELEMENT bid_amount (#PCDATA)>
  <!ELEMENT bid_date (#PCDATA)>
]>

It is clear that many applications will require structured XML views of relational data, in which elements derived from different tables are nested to represent their logical relationships. We believe that an XML query language such as Quilt could be used to construct such structured XML views from simple default XML views of individual tables such as the ones described above. At the end of this section, we will give an example of how such a view can be constructed.

The example queries below illustrate several common types of access to relational data. Since SQL is a well-known relational language, we introduce each query by expressing it in English and in SQL, and then show its representation in Quilt. For each query, we also show a partial result containing enough data to clarify the expected structure of the query result.

Most of the queries shown below generate their output in the form of a single element named <result>, with nested elements that contain the data of the query result. This is a convention that is not required for all queries, as we will see in some examples near the end of the section.

Our first example query is a very simple relational query that retrieves selected columns from the rows of one table that satisfy some search-condition.

(Q1) Find the item number, description, and reserve price for all bicycles that have a reserve price of less than $50, in order by item number.

SQL expression:

SELECT i.itemno, i.description, i.reserve_price
FROM items AS i
WHERE i.description LIKE 'Bicycle'
AND i.reserve_price < 50
ORDER BY i.itemno;

Quilt expression:

<result>
  (
    FOR $i IN document("items.xml")//item_tuple
    WHERE contains($i/description, "Bicycle")
      AND $i/reserve_price < 50
    RETURN
      <item_tuple>
        $i/itemno,
        $i/description,
        $i/reserve_price
      </item_tuple> SORTBY(itemno)
  )
</result>

Partial result:

<result>
  <item_tuple>
    <itemno> 10577 </itemno>
    <description> Mountain Bicycle </description>
    <reserve_price> 45 </reserve_price>
  </item_tuple>
  <item_tuple>
    <itemno> 21684 </itemno>
    <description> 10-speed Bicycle </description>
    <reserve_price> 40 </reserve_price>
  </item_tuple>
</result>

In Q1, the analogy between the clauses in SQL and Quilt is straightforward. The Quilt FOR-clause, like the SQL FROM-clause, binds a variable that ranges over the rows of the table. In both languages, the WHERE-clause is used to filter the set of rows by applying a search condition. In Q1, the search-condition could alternatively have been expressed by using predicates in square brackets in the FOR-clause. The Quilt RETURN-clause provides the function of the SQL SELECT and ORDER BY clauses, generating output, including tags, and controlling the order of the output data.

Query Q2 illustrates how data can be formed into groups and an aggregate function can be applied to each group.

(Q2) For each item that has received a bid, list the item number and the highest bid received for the item, in order by item number.

SQL expression:

SELECT b.itemno, max(b.bid_amount) AS bid
FROM bids AS b
GROUP BY b.itemno
ORDER BY b.itemno;

Quilt expression:

<result>
  (
    FOR $i IN DISTINCT document("bids.xml")//itemno
    LET $b := document("bids.xml")//bid_tuple[itemno = $i]
    RETURN
      <highbid>
        $i,
        <bid> max($b/bid_amount) </bid>
      </highbid> SORTBY(itemno)
  )
</result>

Partial result:

<result>
  <highbid>
    <itemno> 10579 </itemno>
    <bid> 55 </bid>
  </highbid>
  <highbid>
    <itemno> 21648 </itemno>
    <bid> 45 </bid>
  </highbid>
</result>

The following aspects of the Quilt expression for Q2 are worth noting:

  1. Q2 binds two variables. Variable $i ranges over all the distinct item numbers in the BIDS table. Variable $b represents the set of rows in the BIDS table that contain item numbers that match $i. The RETURN clause is executed once for each distinct item number.

  2. The argument of the max function is $b/bid_amount. $b is bound to a set of bid_tuple elements, $b/bid_amount represents the set of bid_amount elements that are nested inside these bid_tuple elements, and max($b/bid_amount) returns the maximum of the values of the bid_amount elements in the set.

  3. In Q2, the condition that appears in square brackets in the LET clause could not have been expressed in a WHERE-clause, since it is essential to the definition of the set to which $b is bound. Any search-conditions that apply to $b in a WHERE-clause would accept or reject the whole set represented by $b rather than individual elements of the set.

Query Q3 represents the type of query called a "join" by relational systems.

(Q3) Find item numbers and descriptions of Bicycles offered by users whose rating is "A", and the names of the offering users (in any order).

SQL expression:

SELECT i.itemno, i.description, u.name AS seller
FROM items AS i, users AS u
WHERE i.offered_by = u.userid
AND i.description LIKE 'Bicycle'
AND u.rating = 'A';

Quilt expression:

<result>
  (
    FOR $i IN document("items.xml")//item_tuple,
        $u IN document("users.xml")//user_tuple

    WHERE $i/offered_by = $u/userid
    AND contains($i/description, "Bicycle")
    AND $u/rating = "A"

    RETURN
      <interesting_item>
        $i/itemno,
        $i/description,
        <seller> $u/name/text() </seller>
      </interesting_item>
  )
</result>

Partial result:

<result>
  <interesting_item>
    <itemno> 10579 </itemno>
    <description> Mountain Bicycle </description>
    <seller> Jones </seller>
  </interesting_item>
  <interesting_item>
    <itemno> 20860 </itemno>
    <description> Racing Bicycle </description>
    <seller> Smith </seller>
  </interesting_item>
</result>

The result of Query Q3 includes some elements whose names are derived from the original document and some other elements whose names are generated by the query itself. For example, the expression $i/itemno evaluates to an element named "itemno", which is included in the query result with its original tag. The expression $u/name would have generated an element named "name". Since it is desired to include the text content of the name element in the query result with a different tag ("seller"), Q3 uses the text() function of XPath to extract the text content and encloses it in explicit begin- and end-tags.

Query Q4 is a relatively complex query that combines the relational functions of join, grouping, and ordering.

(Q4) List item number, description, number of bids, and highest bid for items that have received at least 50 bids, in descending order by highest bid.

SQL expression:

SELECT i.itemno, i.description, 
       count(*) AS bid_count,
       max(b.bid_amount) AS high_bid
FROM items AS i, bids AS b
WHERE i.itemno = b.itemno
GROUP BY i.itemno, i.description
HAVING count(*) >= 50
ORDER BY high_bid DESC;

Quilt expression:

<result>
  (
    FOR $i IN document("items.xml")//item_tuple,
    LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno]
    WHERE count($b) >= 50
    RETURN
      <popular_item>
        $i/itemno,
        $i/description,
        <bid_count> count($b) </bid_count>,
        <high_bid> max($b/bid_amount) </high_bid>
      </popular_item> SORTBY(high_bid) DESCENDING
  )
</result>

(Partial result):

<result>
  <popular_item>
    <itemno> 93558 </itemno>
    <description> Tennis Racket </description>
    <bid_count> 61 </bid_count>
    <high_bid> 20 </high_bid>
  </popular_item>
  <popular_item>
    <itemno> 45076 </itemno>
    <description> Toaster </description>
    <bid_count> 57 </bid_count>
    <high_bid> 11 </high_bid>
  </popular_item>
</result>

In the Quilt expression for Q4, the WHERE clause applies a filtering condition to the sets that are bound to $b in the LET clause. In this case, the WHERE clause of Quilt is analogous to the HAVING clause of SQL.

Query Q5 matches information from the USERS table with information from the ITEMS table, preserving those rows of the USERS table that have no counterpart in the ITEMS table. In relational terminology, this type of query is called an "outer join."

(Q5) List the names of all users and the descriptions of the items they have offered for sale. Users who have not offered any items should be included, and output should be ordered by alphabetically by user name and secondarily by item description.

SQL expression:

SELECT u.name, i.description AS offering
FROM users AS u OUTER JOIN items AS i ON u.userid = i.offered_by
ORDER BY name, offering;

Quilt expression:

<result>
  (
    FOR $u IN document("users.xml")//user_tuple
    RETURN
      <user>
        $u/name,
        (
          FOR $i IN document("items.xml")//item_tuple
                      [itemno = $u/offered_by]
          RETURN
            <offering> 
              $i/description/text()
            </offering> SORTBY(.)
        )
      </user> SORTBY(name)
  )
</result>

(Partial result):

<result>
  <user> 
      <name> Andrews </name>
      <offering> Golf club </offering>
      <offering> Tennis racket </offering>
  </user>
  <user> 
      <name> Baker </name>
  </user>
  <user> 
      <name> Caswell </name>
      <offering> Coffee maker </offering>
  </user>
</result>

The following features of the Quilt expression for Q5 are worth noting:

  1. The results of the SQL query and the Quilt query are not exactly the same. Since SQL is a relational language, the result of an SQL query is always a table. In the SQL query result for Q5, each user's name is repeated in the query output for each item offered by that user. Quilt, on the other hand, takes advantage of the hierarchical nature of XML to avoid repeating user names. In the Quilt query result for Q5, each user name appears once, in an element with the tag USER, and all the items offered by a given user are nested inside the appropriate USER element.

  2. In previous queries, when a SORTBY clause was used to specify the ordering of a set of elements in the query result, the argument of SORTBY was the name of the nested element to be used as the sort key. In the result of Q5, however, we wish the <offering> elements to be sorted by their own string values rather than by the value of a nested element. Therefore, we specify "SORTBY(.)", indicating that the sort key is the element itself.

Q6 illustrates the use of the exists() function, which takes a set of elements as its argument and returns True if the set is non-empty. In Q6, the argument of the exists() function is an XPath expression.

(Q6) List names of users who have not offered anything for sale, in alphabetical order.

SQL expression:

SELECT u.name
FROM users AS u
WHERE NOT EXISTS
  (SELECT i.itemno
   FROM items AS i
   WHERE i.offered_by = u.userid)
ORDER BY name;

Quilt expression:

FOR $u IN document("users.xml")//user_tuple
WHERE NOT exists
   (document("items.xml")//item_tuple[offered_by = $u/userid])
RETURN $u/name SORTBY(.)

(Partial result):

<name> Baker </name>
<name> Blackstone </name>
<name> Thompson </name>

Unlike our previous examples, Q6 does not "wrap" its result in an enclosing element such as <result>. Instead, Q6 returns a list of <name> elements, ordered by their content. It is also possible for a Quilt query to return a simple value, as in Q7.

(Q7) Find the highest amount that has ever been bid for any item.

SQL expression:

SELECT max(bid_amount)
FROM bids;

Quilt expression:

LET $b := document("bids.xml")//bid_amount
RETURN max($b)

Example result:

  5250

We have previously mentioned the importance of providing a nested view of a relational database. A Quilt query can be used to define such a view, based on simple default views such as the ones in the above examples. A view definition facility might take the form of a DEFINE statement that uses a query to define a view that can be subsequently used in other queries and view definitions as though it were a real document. A system that implements such a facility would need to maintain a catalog of view definitions. Although the details of view definition in Quilt remain to be worked out, we provide in Q8 an example of what a DEFINE statement might look like.

(Q8) Construct a nested XML view of the auction database, consisting of a single Auction element. Inside the Auction element, include a User element for each registered user, in order by userid. If a user has offered any items for sale, these items are represented by Item elements nested inside the User element, ordered by item number. If an item has received any bids, these bids are represented by Bid elements nested inside the Item element, ordered by bid date.

DEFINE $auction AS
  <auction>
    (
      FOR $u IN document("users.xml")//user_tuple
      RETURN 
        <User>
          $u/userid,
          $u/name,
          $u/rating,
          (
            FOR $i IN document("items.xml")//item_tuple
            WHERE $i/offered_by = $u/userid
            RETURN
              <Item>
                $i/itemno,
                $i/description,
                $i/reserve_price,
                (
                  FOR $b IN document("bids.xml")//bid_tuple
                  WHERE $b/itemno = $i/itemno
                  RETURN
                    <Bid>
                      $b/userid,
                      $b/bid_amount,
                      $b/bid_date,
                    </Bid> SORT BY bid_date
                )
              </Item> SORT BY itemno
          )
        </User> SORT BY userid
    )
</auction>

5 Querying Structured Documents

In XML, the two main forms of structure are hierarchy and sequence. Since purely relational databases have neither hierarchy nor sequence, most database systems do not support queries that take advantage of this structure. Several queries shown earlier in this paper create hierarchies from flat relational data, map one hierarchy onto another hierarchy, or flatten hierarchical data to show a tabular view. We have also seen some queries that perform some transformation on a document while preserving its hierarchical structure. In this section, we will investigate queries that use the hierarchy or sequence of a document to formulate conditions. Our queries will be based the following surgical data, provided to us by Dr. Tom Lincoln, taken from a fictitious patient record encoded in a HL7 Patient Record Architecture format:

<section><section.title>Procedure</section.title> 
  The patient was taken to the operating room where 
  she was placed in a supine position and 
  <Anesthesia>induced under general anesthesia.</Anesthesia> 
  <Prep>
  <action>Foley catheter was placed to decompress the bladder</action> 
  and the abdomen was then prepped and draped in sterile fashion. 
  </Prep>
  <Incision>
  A curvilinear incision was made 
  <Geography>in the midline immediately infraumbilical</Geography> 
  and the subcutaneous tissue was divided 
  <Instrument>using electrocautery.</Instrument> 
  </Incision>
  The fascia was identified and 
  <action>#2 0 Maxon stay sutures were placed 
  on each side of the midline.</action> 
  <Incision>
  The fascia was divided using 
  <Instrument>electrocautery</Instrument> 
  and the peritoneum was entered. 
  </Incision>
  <Observation>The small bowel was identified</Observation> 
  and 
  <action>
  the 
  <Instrument>Hasson trocar</Instrument> 
  was placed under direct visualization. 
  </action>
  <action>
  The 
  <Instrument>trocar</Instrument>
  was secured to the fascia using the stay sutures.
  </action>
<!-- The section continues, but we excerpt it here... -->

Note that this document has a structure like that generally associated with documents, but contains data that partly resembles traditional database data. This is a useful general approach to embedding data in narrative or other kinds of documents.

First we will examine a few queries that use absolute position. XPath expressions use subscripts to indicate absolute position. Consider the following query:

(Q9): In each section with title "Procedure", what Instruments were used in the second Incision?

FOR $s IN //section[section.title = "Procedure"]
RETURN ($s//Incision)[2]/Instrument

Note that the precedence rules of XPath do not give the desired result without the parentheses in the RETURN clause. The expression "$s//Incision[2]" would return nodes that are the second Incision node within their immediate parent. Since we are looking for the second Incision among all the descendants of $s, regardless of level, we use the expression "($s//Incision)[2]".

Subscripts can also be used to indicate ranges. Consider the following query:

(Q10): In each section with title "Procedure", what are the first two instruments to be used?

FOR $s IN //section[section.title = "Procedure"]
RETURN ($s//Instrument)[1-2]

Now we will consider three queries that use the BEFORE and AFTER operators of XQL. The expression "exp1 BEFORE exp2", in which exp1 and exp2 are path expressions, evaluates to the list of nodes selected by exp1 that occur before at least one node selected by exp2. We will start with a query that combines AFTER with subscripts:

(Q11): What instruments were used in the first two actions after the second incision?

   FOR $i IN (//Incision)[2] ,
       $a IN (//action AFTER $i)[1-2]
   RETURN $a//Instrument

Next we will consider a query where the sequence has a more dramatic significance:

(Q12): Find "Procedure" sections where no anesthesia element occurs before the first incision.

   FOR $proc IN //section[section.title="Procedure"]
   WHERE NOT exists( $proc//Anesthesia BEFORE ($proc//Incision)[1] )
   RETURN $proc

In many cases, BEFORE and AFTER are used to reconstruct a narrative, as in the following example.

(Q13): In the first procedure, what happened between the first incision and the second incision?

   FOR $proc IN //section[section.title="Procedure"][1],
       $bet IN $proc//((* AFTER ($proc//incision)[1]) 
                          BEFORE ($proc//incision)[2])
   RETURN $bet

6 Exploiting References in Documents

References from one element to another are an important part of XML. Since XPath expressions are a basic building block of the Quilt grammar, Quilt could exploit references by using the ID function of XPath, which was designed for this purpose. Because of the importance of references, however, Quilt introduces a new reference-related abbreviation that can be used in XPath expressions in Quilt queries.

In this section, we will describe the Quilt syntax for following references. Our examples will be based on intra-document references using attributes of type ID and IDREF, but we believe that a straightforward extension can accomodate inter-document references based on the XPointer specification.

Quilt introduces the notation X-> as an abbreviation for the XPath function ID(X). The expression X-> evaluates to the set of nodes that are referenced by X. The commonest use of this notation is with an attribute of type IDREF or IDREFS. For example, suppose that an element of type Figref has an IDREF-type attribute named "target" that references a Figure element. Then, in a Quilt query, the following expression could be used to find all Figure elements that are targets of Figref elements:

//Figref/@target->

The Quilt "right-arrow" notation allows an expression containing a reference to be read from left to right, and is intended to be easier to read than the XPath functional notation. For example, the following expression finds the titles of all Figure elements that are targets of Figref elements:

//Figref/@target->/Title

To illustrate the use of references in Quilt queries, suppose that we have a repository containing a collection of papers. The repository might be represented as a large XML document in which each paper is represented by a Paper element with a unique ID attribute. Nested immediately inside each Paper element is a Title element. Many of the papers reference other papers, and these references are represented by Reference elements nested at some level inside the Paper elements, each containing an attribute named "paperid" of type IDREF that matches the ID attribute of the referenced paper.

The example query below illustrates the use of references, and also illustrates how a Quilt query can transform the structure of a document. In this example, Title elements in the input document are transformed into attributes in the output document.

(Q14): Generate a list of papers, each identified by its title, and each containing a list of the titles of all the papers that it references.

FOR $p IN //Paper
RETURN
  <Paper title=$p/Title>
    (
    FOR $r IN $p//Reference
    RETURN
      <Reference title = $r/@paperid->/Title />
    )
  </Paper>

The following partial result illustrates the structure of the document that might be generated by Q14:

<Paper title="The Genius of Mark Twain">
   <Reference title="Innocents Abroad" />
   <Reference title="Huckleberry Finn" />
</Paper>
<Paper title="The Social Commentary of John Steinbeck">
   <Reference title="Grapes of Wrath" />
   <Reference title="Cannery Row" />
</Paper>

7 Conclusion

Traditionally, documents and databases have been considered to be separate and distinct forms of information. Documents have an irregular structure in which hierarchy and sequence are very important. Databases have a much more regular structure and, in relational databases, place little importance on hierarchy and sequence. With the emergence of XML, the sharp distinction between different forms of information such as documents and databases is quickly disappearing. Unfortunately, most existing query languages are either document-oriented or database-oriented, and find it awkward or impossible to express queries outside the domain for which they were designed.

The Quilt language attempts to pull together features from several languages that enable it to operate on a broad range of data sources. From XPath and XQL it draws a powerful path expression syntax that can navigate inside a hierarchical document, selecting a set of nodes that satisfy a complex predicate. From XML-QL it draws the notion of bound variables and a versatile syntax that can generate an output document of arbitrary structure.

This paper has illustrated the versatility of Quilt by using it to express queries against diverse sources, ranging from documents to relational databases. On relational data, we have written queries involving joins, outer joins, and grouping. On documents, we have written queries that preserve hierarchy and exploit ordering and references. We believe that Quilt represents a promising approach to a query language that can help to realize the potential of XML as a universal medium for data interchange.

8 References

Exemplars
Mary Fernandez et al. XML Query Languages: Experiences and Exemplars. See http://www-db.research.bell-labs.com/user/simeon/xquery.html
Lorel
Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. The Lorel Query Language for Semistructured Data. International Journal on Digital Libraries, 1(1):68-88, April 1997. See http://www-db.stanford.edu/~widom/pubs.html
Tree Structure
Jonathan Robie. The Tree Structure of XML Queries. See http://www.w3.org/XML/Group/1999/10/xquery-tree.html
ODMG
Rick Cattell et al. The Object Database Standard: ODMG-93, Release 1.2. Morgan Kaufmann Publishers, San Francisco, 1996.
XML
World Wide Web Consortium. Extensible Markup Language (XML) 1.0. W3C Recommendation. See http://www.w3.org/TR/1998/REC-xml-19980210
XML-QL
Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A Query Language for XML. See http://www.research.att.com/~mff/files/final.html
XML Query Requirements
World Wide Web Consortium. XML Query Requirements. W3C Working Draft. See http://www.w3.org/TR/xmlquery-req
XPath
World Wide Web Consortium. XML Path Language (XPath) Version 1.0. W3C Recommendation. See http://www.w3.org/TR/xpath.html
XPointer
World Wide Web Consortium. XML Pointer Language (XPointer). W3C Working Draft. See http://www.w3.org/TR/WD-xptr
XQL
J. Robie, J. Lapp, D. Schach. XML Query Language (XQL). See http://www.w3.org/TandS/QL/QL98/pp/xql.html
XSLT
World Wide Web Consortium. XSL Transformations (XSLT). W3C Recommendation. See http://www.w3.org/TR/xslt
YATL
S. Cluet, S. Jacqmin, and J. Simeon. The New YATL: Design and Specifications. Technical Report, INRIA, 1999.