IBM Personal Communication

Recent photo of Lucian

Lucian Popa's Home Page 

IBM Research - Almaden 
Dept. 8CC/B1 
650 Harry Road 
San Jose, California 95120 USA

Phone: (408) 927-1914 
Fax:     (408) 927-3215 

Brief Information 

I joined IBM Almaden Research Center as a Research Staff Member in September 2000, right after I received my PhD in Computer Science from the University of Pennsylvania. Previously, I received an M.S. in Computer Science from Politehnica University of Bucharest, in Romania. My main research interests include data management, meta-data management, information integration, database theory. At Almaden, I am a member of the Intelligent Information Integration group (aka the Clio group) managed by Howard Ho.

Research Projects

  • Schema Mapping Generation (Clio).
At IBM Almaden and in collaboration with University of Toronto, we developed Clio, a semi-automatic system for data translation between different formats (schemas). Using a visual interface, a relatively non-expert user can rapidly construct, explore and select among several alternatives of transforming data conforming to a source schema to data conforming to a target schema. One of the novelties of the system is the automatic derivation of a finite set of queries (XQuery, XSLT or SQL enhanced with id creation) from the visual specification of the schema mapping.  The set of queries are derived based on the integrity constraints of the source and such that the integrity constraints of the target are satisfied. The use of constraints allows to build "intelligent" transformations between schemas with quite complex relationships (whether these schemas are relational or XML). 
  • Clio Grows Up: From Research Prototype to Industrial Tool, with Laura Haas, Mauricio A. Hernandez, Howard Ho, and Mary Roth. Industrial paper in SIGMOD'05, Baltimore, MD, June 2005.
  • Translating Web Data, with Yannis Velegrakis, Renee J. Miller, Mauricio A. Hernandez, and Ronald Fagin. VLDB'02, Hong Kong SAR, China, August 2002, pp.598-609.
  • Schema Management, with Periklis Andritsos, Ronald Fagin, Ariel Fuxman, Laura Haas, Mauricio Hernandez, Howard Ho, Anastasios Kementsietsidis, Renee J. Miller, Felix Nauman, Yannis Velegrakis, Charlotte Vilarem and Ling-Ling Yan, IEEE Data Engineering Bulletin 25, 3, 2002, pp. 33-39.
  • Mapping XML and Relational Schemas with Clio, with Mauricio A. Hernandez, Yannis Velegrakis, Renee J. Miller, Felix Naumann and Howard Ho. System Demo in  ICDE 2002, San Jose, CA.
  • The Clio Project: Managing Heterogeneity, with Renee J. Miller, Mauricio A. Hernandez, Laura M. Haas, Ling-Ling Yan, C. T. Howard Ho and Ronald Fagin. SIGMOD Record 30(1), March 2001, pp. 78-83.

  • Schema Mapping Management.
The goal of this project is to study the algorithmic and foundational aspects as well as the implementation issues surrounding a system for schema mapping management. Such a system is envisioned to be an integral part of any meta-data management system that enables cooperation between applications at the data level. A somewhat similar framework (called there model management) is being investigated by Phil Bernstein and his group at Microsoft Research. Some of the main challenges that my collaborators and I are addressing are:
    • design of a schema mapping language that:
      • is high-level, declarative, and logic-based,
      • is simple enough to be understood and manipulated by tools,
      • facilitates (semi-)automatic generation of schema mappings,
      • conveys enough information for runtime (e.g., to exchange data or answer queries, to generate XSLT transformations, etc.)
    • semantics of schema mappings and data exchange based on schema mappings:
      • what does a schema mapping mean in terms of the actual instances of the schemas
      • how do we perform data translation, from a schema to another, based on a schema mapping specification
    • code generation based on schema mappings:
      • how to generate efficient queries or transformations in various languages to (fully or partially) implement the mappings
      • Clio is an example of a first step in this direction
    • query answering and query rewriting over schema mappings:
      • how to rewrite a query over one schema into a query over another schema, given the schema mapping
      • federation and data integration aspects of query answering and rewriting
    • composition of sequential schema mappings:
      • this is a crucial feature that enables re-use of mappings when schemas are different or change
    • preserving mappings under schema evolution.
      • schema evolution is a hard problem in many ways; here we are addressing the problem of maintaining schema mappings in the face of schema evolution (whether this evolution is incremental or more drastic)
      • we have explored two approaches so far:
        • incremental adaptation algorithm, using a change-based representation of schema evolution (see VLDB'03 paper)
        • mapping composition approach, using a mapping-based representation of schema evolution (see VLDB'05 paper)

  • Query Optimization with Chase and Backchase (C&B). 
At the University of Pennsylvania, my thesis work focused on a new and interesting technique for query rewriting. This technique uses two basic rules: chase and backchase. A query is chased with constraints in order to produce a larger, but equivalent, query that incorporates all the alternate ways of answering the original query (views, indexes, other relations or OO classes). This larger query can then be minimized, by using the backchase rule, to produce a complete set of  minimal and equivalent rewritings. There are many examples for which the C&B algorithm finds rewritings that are good candidates for execution but cannot be discovered by the more traditional query rewriting systems. 
One novelty of the algorithm is that it unifies disparate techniques such as semantic optimization based on integrity constraints,  rewriting queries using views, and rewriting queries using indexes. In fact, it can do any rewriting as long as the right set of constraints is specified. The data model considered includes nested relations as well as OO classes.
Many of the ideas, concepts, techniques, as well as the experience accumulated while working on C&B turned out to be quite influential for my later projects (e.g., Clio).


More about C&B query optimization can be found at the UPenn DB Group site or in my dissertation:
Object/Relational Query Optimization with Chase and Backchase
, PhD Thesis, 2000, Univ. of Pennsylvania, Advisor: Val Tannen. 

Other Publications


I grew up in Romania and spent five years in Philadelphia, Pennsylvania. Now I live in San Jose, California with my wife and daughter Isabella.

CS Home | Almaden Home | IBM Research | Feedback