- Schema Mapping
At IBM Almaden and in
with University of Toronto, we developed Clio,
a semi-automatic system for data translation between
different formats (schemas). Using a visual interface, a relatively
non-expert user can rapidly construct, explore and select among several
alternatives of transforming data conforming to a source schema to data
to a target schema. One of the novelties of the system is the automatic
derivation of a
finite set of queries (XQuery, XSLT or SQL enhanced with id creation)
from the visual specification of the schema mapping. The set of
queries are derived based on the integrity constraints of the source
and such that the integrity constraints of the target are satisfied.
The use of constraints allows to build "intelligent" transformations
between schemas with quite complex relationships (whether these schemas
are relational or XML).
- Clio Grows Up: From
Research Prototype to Industrial Tool, with Laura Haas,
Mauricio A. Hernandez, Howard Ho, and Mary Roth. Industrial paper in
SIGMOD'05, Baltimore, MD, June 2005.
- Translating Web Data, with
Yannis Velegrakis, Renee J. Miller, Mauricio A. Hernandez, and
Ronald Fagin. VLDB'02, Hong Kong SAR, China, August 2002, pp.598-609.
- Schema Management,
Ronald Fagin, Ariel Fuxman, Laura Haas, Mauricio Hernandez,
Howard Ho, Anastasios Kementsietsidis,
Renee J. Miller, Felix Nauman,
Yannis Velegrakis, Charlotte Vilarem and Ling-Ling Yan,
Engineering Bulletin 25, 3, 2002, pp. 33-39.
- Mapping XML and
Schemas with Clio, with Mauricio A. Hernandez, Yannis
Velegrakis, Renee J. Miller, Felix Naumann and Howard Ho. System Demo
in ICDE 2002, San Jose, CA.
Clio Project: Managing Heterogeneity, with Renee J.
Miller, Mauricio A. Hernandez, Laura M.
Haas, Ling-Ling Yan, C. T. Howard Ho and Ronald Fagin. SIGMOD Record
30(1), March 2001, pp. 78-83.
- Schema Mapping Management.
The goal of this project is to
study the algorithmic and foundational aspects as well as
the implementation issues surrounding a system for schema mapping
management. Such a system is envisioned to be an integral part of
any meta-data management system that enables cooperation between
applications at the data level. A
somewhat similar framework (called there model management) is being
investigated by Phil Bernstein and his group at Microsoft
Research. Some of the main challenges that my collaborators and I are
- design of a schema
mapping language that:
- is high-level, declarative, and logic-based,
- is simple enough to be understood and manipulated by
- facilitates (semi-)automatic generation of schema
- conveys enough information for runtime
(e.g., to exchange data or answer queries, to generate XSLT
- semantics of
schema mappings and data exchange
based on schema mappings:
- what does a schema mapping mean in terms of the actual
instances of the schemas
- how do we perform data translation, from a schema to
another, based on a schema mapping specification
- code generation
based on schema mappings:
- how to generate efficient queries or transformations in
various languages to (fully or partially) implement the mappings
- Clio is an example of a first step in this direction
- query answering and
query rewriting over schema mappings:
- how to rewrite a query over one
schema into a query over another schema, given the schema mapping
- federation and data integration aspects of query
answering and rewriting
- composition of
sequential schema mappings:
- this is a crucial feature that enables re-use of
mappings when schemas are different or change
- preserving mappings
under schema evolution.
- schema evolution is a hard problem in many ways; here
we are addressing the problem of maintaining schema mappings in the
face of schema evolution (whether this evolution is incremental or more
- we have explored two approaches so far:
- incremental adaptation algorithm, using a
change-based representation of schema evolution (see VLDB'03 paper)
- mapping composition approach, using a mapping-based
representation of schema evolution (see VLDB'05 paper)
- Semantic Adaptation
Mappings when Schemas Evolve, with Cong Yu. VLDB'05, Trondheim,
Norway, September 2005. To appear.
- Composing Schema
Mappings: Second-Order Dependencies to the Rescue, with Ronald
Fagin, Phokion G. Kolaitis and Wang-Chiew Tan. PODS'04, Paris, France,
June 2004, pp. 83-94.
XML Query Rewriting for Data Integration, with Cong Yu.
SIGMOD'04, Paris, France, June 2004, pp. 371-382.
Adaptation under Evolving Schemas, with Yannis Velegrakis
Renee J. Miller. VLDB'03, Berlin, Germany, September 2003, pp. 584-595.
The full version (published in the VLDB Journal) can be found here.
- Data Exchange: Getting to
the Core, with Ronald Fagin and Phokion G. Kolaitis.
PODS'03, San Diego,
California, June 2003, pp. 90-101. The full version of this paper (to
appear in TODS) can
be found here .
- Data Exchange:
and Query Answering (
Springer LINK ), with Ronald Fagin, Phokion G. Kolaitis and Renee
Miller. ICDT'03, Siena, Italy, January 2003, pp. 207-224. The full
version of this paper (to appear in TCS) can be found here
- Query Optimization with Chase and Backchase
At the University of
Pennsylvania, my thesis work focused on a new
and interesting technique for query rewriting. This technique uses two
basic rules: chase and backchase. A query is chased with constraints in
order to produce a larger, but equivalent, query that incorporates all
the alternate ways of answering the original query (views, indexes,
or OO classes). This larger query can then be minimized, by using the
backchase rule, to produce a complete set of minimal and
equivalent rewritings. There are many examples for which the C&B
algorithm finds rewritings that are good candidates for execution but
cannot be discovered by the more traditional query rewriting
One novelty of the algorithm is
that it unifies disparate techniques
such as semantic optimization based on integrity constraints,
rewriting queries using views, and rewriting queries using indexes. In
fact, it can do any rewriting as long as the right set of
constraints is specified. The data model considered includes nested
relations as well as OO
Many of the ideas, concepts,
techniques, as well as the experience
accumulated while working on C&B turned out to be quite influential
for my later projects (e.g., Clio).
Chase Too Far?, with Alin Deutsch, Arnaud Sahuguet and
Val Tannen. SIGMOD'00, Dallas, Texas, May 2000, pp. 273-284.
- Physical Data Independence,
Constraints, and Optimization with Universal Plans, with
Alin Deutsch and Val Tannen. VLDB'99, Edinburgh,
Scotland, September 1999, pp. 459-470
Equational Chase for Path-Conjunctive Queries, Constraints, and Views,
with Val Tannen. ICDT'99, Jerusalem, Israel, January
More about C&B query
optimization can be found at the UPenn
site or in my
Optimization with Chase and Backchase
, PhD Thesis, 2000,
Univ. of Pennsylvania, Advisor: Val Tannen.