Research Projects
- Schema Mapping
Generation (Clio).
At IBM Almaden and in
collaboration
with University of Toronto, we developed Clio,
a semi-automatic system for data translation between
different formats (schemas). Using a visual interface, a relatively
non-expert user can rapidly construct, explore and select among several
alternatives of transforming data conforming to a source schema to data
conforming
to a target schema. One of the novelties of the system is the automatic
derivation of a
finite set of queries (XQuery, XSLT or SQL enhanced with id creation)
from the visual specification of the schema mapping. The set of
queries are derived based on the integrity constraints of the source
and such that the integrity constraints of the target are satisfied.
The use of constraints allows to build "intelligent" transformations
between schemas with quite complex relationships (whether these schemas
are relational or XML).
Publications:
- Clio Grows Up: From
Research Prototype to Industrial Tool, with Laura Haas,
Mauricio A. Hernandez, Howard Ho, and Mary Roth. Industrial paper in
SIGMOD'05, Baltimore, MD, June 2005.
- Translating Web Data, with
Yannis Velegrakis, Renee J. Miller, Mauricio A. Hernandez, and
Ronald Fagin. VLDB'02, Hong Kong SAR, China, August 2002, pp.598-609.
- Schema Management,
with
Periklis Andritsos,
Ronald Fagin, Ariel Fuxman, Laura Haas, Mauricio Hernandez,
Howard Ho, Anastasios Kementsietsidis,
Renee J. Miller, Felix Nauman,
Yannis Velegrakis, Charlotte Vilarem and Ling-Ling Yan,
IEEE Data
Engineering Bulletin 25, 3, 2002, pp. 33-39.
- Mapping XML and
Relational
Schemas with Clio, with Mauricio A. Hernandez, Yannis
Velegrakis, Renee J. Miller, Felix Naumann and Howard Ho. System Demo
in ICDE 2002, San Jose, CA.
- The
Clio Project: Managing Heterogeneity, with Renee J.
Miller, Mauricio A. Hernandez, Laura M.
Haas, Ling-Ling Yan, C. T. Howard Ho and Ronald Fagin. SIGMOD Record
30(1), March 2001, pp. 78-83.
- Schema Mapping Management.
The goal of this project is to
study the algorithmic and foundational aspects as well as
the implementation issues surrounding a system for schema mapping
management. Such a system is envisioned to be an integral part of
any meta-data management system that enables cooperation between
applications at the data level. A
somewhat similar framework (called there model management) is being
investigated by Phil Bernstein and his group at Microsoft
Research. Some of the main challenges that my collaborators and I are
addressing are:
- design of a schema
mapping language that:
- is high-level, declarative, and logic-based,
- is simple enough to be understood and manipulated by
tools,
- facilitates (semi-)automatic generation of schema
mappings,
- conveys enough information for runtime
(e.g., to exchange data or answer queries, to generate XSLT
transformations, etc.)
- semantics of
schema mappings and data exchange
based on schema mappings:
- what does a schema mapping mean in terms of the actual
instances of the schemas
- how do we perform data translation, from a schema to
another, based on a schema mapping specification
- code generation
based on schema mappings:
- how to generate efficient queries or transformations in
various languages to (fully or partially) implement the mappings
- Clio is an example of a first step in this direction
- query answering and
query rewriting over schema mappings:
- how to rewrite a query over one
schema into a query over another schema, given the schema mapping
- federation and data integration aspects of query
answering and rewriting
- composition of
sequential schema mappings:
- this is a crucial feature that enables re-use of
mappings when schemas are different or change
- preserving mappings
under schema evolution.
- schema evolution is a hard problem in many ways; here
we are addressing the problem of maintaining schema mappings in the
face of schema evolution (whether this evolution is incremental or more
drastic)
- we have explored two approaches so far:
- incremental adaptation algorithm, using a
change-based representation of schema evolution (see VLDB'03 paper)
- mapping composition approach, using a mapping-based
representation of schema evolution (see VLDB'05 paper)
Publications:
- Semantic Adaptation
of Schema
Mappings when Schemas Evolve, with Cong Yu. VLDB'05, Trondheim,
Norway, September 2005. To appear.
- Composing Schema
Mappings: Second-Order Dependencies to the Rescue, with Ronald
Fagin, Phokion G. Kolaitis and Wang-Chiew Tan. PODS'04, Paris, France,
June 2004, pp. 83-94.
- Constraint-Based
XML Query Rewriting for Data Integration, with Cong Yu.
SIGMOD'04, Paris, France, June 2004, pp. 371-382.
- Mapping
Adaptation under Evolving Schemas, with Yannis Velegrakis
and
Renee J. Miller. VLDB'03, Berlin, Germany, September 2003, pp. 584-595.
The full version (published in the VLDB Journal) can be found here.
- Data Exchange: Getting to
the Core, with Ronald Fagin and Phokion G. Kolaitis.
PODS'03, San Diego,
California, June 2003, pp. 90-101. The full version of this paper (to
appear in TODS) can
be found here .
- Data Exchange:
Semantics
and Query Answering (
Springer LINK ), with Ronald Fagin, Phokion G. Kolaitis and Renee
J.
Miller. ICDT'03, Siena, Italy, January 2003, pp. 207-224. The full
version of this paper (to appear in TCS) can be found here
.
- Query Optimization with Chase and Backchase
(C&B).
At the University of
Pennsylvania, my thesis work focused on a new
and interesting technique for query rewriting. This technique uses two
basic rules: chase and backchase. A query is chased with constraints in
order to produce a larger, but equivalent, query that incorporates all
the alternate ways of answering the original query (views, indexes,
other relations
or OO classes). This larger query can then be minimized, by using the
backchase rule, to produce a complete set of minimal and
equivalent rewritings. There are many examples for which the C&B
algorithm finds rewritings that are good candidates for execution but
cannot be discovered by the more traditional query rewriting
systems.
One novelty of the algorithm is
that it unifies disparate techniques
such as semantic optimization based on integrity constraints,
rewriting queries using views, and rewriting queries using indexes. In
fact, it can do any rewriting as long as the right set of
constraints is specified. The data model considered includes nested
relations as well as OO
classes.
Many of the ideas, concepts,
techniques, as well as the experience
accumulated while working on C&B turned out to be quite influential
for my later projects (e.g., Clio).
Publications:
- A
Chase Too Far?, with Alin Deutsch, Arnaud Sahuguet and
Val Tannen. SIGMOD'00, Dallas, Texas, May 2000, pp. 273-284.
- Physical Data Independence,
Constraints, and Optimization with Universal Plans, with
Alin Deutsch and Val Tannen. VLDB'99, Edinburgh,
Scotland, September 1999, pp. 459-470
- An
Equational Chase for Path-Conjunctive Queries, Constraints, and Views,
with Val Tannen. ICDT'99, Jerusalem, Israel, January
1999, pp.
39-57.
More about C&B q uery
optimization can be found at the UPenn
DB Group
site or in my
dissertation:
Object/Relational
Query
Optimization with Chase and Backchase, PhD Thesis, 2000,
Univ. of Pennsylvania, Advisor: Val Tannen.
|