IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Information Management

Computer Science


Overview

For more than three decades, IBM Research has produced major contributions to the area of data management. This includes E. F. Codd's seminal work on relational algebra; the System R relational database management system prototype, which led to IBM's DB2®; ARIES transaction recovery and logging; Starburst extensible database technology, and DB2 parallel database technology.

The Information Management (IM) group is proud to carry on IBM's rich tradition of excellence with groundbreaking research in data management technology. We utilize our expertise in XML, machine learning, text analysis, artificial intelligence, and application-enabling middleware to tackle the challenges posed by the proliferation of unstructured and semi-structured data in business applications, the life sciences, and personal information management.

The areas we focus on include query optimization, OLAP cubes, information processing, enterprise mashups, multidimensional content, and integration of structured, semi-structured, and unstructured data, as well as the emerging areas of e-commerce, Internet, and mobile applications.

The research contributions of our projects result not only in patents and papers in leading conferences, but provide a steady stream of technology shipping within IBM products, such as IBM's Content Manager, Lotus Discovery Server, and DB2.

Further information about IBM's work in data management can be found at:

arrow image IBM Research Data Management

Current Projects:


eXtreme Analytics Platform: Platforms such as Hadoop, an Apache open source project, have been designed to store web-scale data and support complex web analytics programmed using the Map-Reduce paradigm. We are exploring the use of Hadoop, with important extensions, as an enterprise platform for extreme, enterprise analytics - that is, extremely complex analytics on extremely large volumes of data. Our goal is to build a powerful analytics platform, and to use it to create analytic applications providing solutions to problems that have not been economically feasible to solve until now.
  • Project Contact: John McPherson

Jaql: Jaql is a query language whose objectives are to research semi-structured query processing, extensibility, and parallelization. We use JSON (JavaScript Object Notation) as a simple, yet flexible way to represent data that ranges from flat, relational data to semi-structured, XML data. So long as a "JSON view" over data can be defined, Jaql will process it. Jaql is developed in the open source with an Apache 2.0 License.
  • Project Contact: Eugene Shekita

Blink: The Blink project is developing a scalable query engine that consistently responds to Business Intelligence queries against a data warehouse in mere seconds, without the necessity for the complicated and human-intensive "performance layer" of indexes, materialized views, and pre-computation of today's data warehouse systems. It exploits many disruptive hardware technologies -- including large main memories, commodity multi-core processors, and fast interconnects -- together with innovative software developed by Almaden Research to highly compress and de-normalize data, apply query predicates and perform grouping on the compressed data, maximize parallelism, minimize L2 cache misses, and significantly simplify administration.
  • Project Contact: Guy Lohman

CloudDB: Traditional relational databases are often too rigid and don't provide enough scalability for many content-oriented applications. In the CloudDB project, we are building a distributed database on commodity hardware that provides a flexible data model, scalability (to hundreds of nodes), elasticity (incrementally adding nodes with no down time), and fault tolerance. Our research heavily leverages scalable, open source data stores such as HBase and Cassandra.
  • Project Contact: Eugene Shekita

Recent Publications:

  • Andrey Balmin, Latha S. Colby, Emiran Curtmola, Quanzhong Li, Fatma Ozcan: Search Driven Analysis of Heterogenous XML Data. CIDR 2009
  • Knut Stolze, Vijayshankar Raman, Richard Sidle, O. Draese: Bringing BLINK Closer to the Full Power of SQL. BTW 2009: 157-166
  • David E. Simmen, Frederick Reiss, Yunyao Li, Suresh Thalamati: Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT. SIGMOD Conference 2009: 1123-1126
  • Kevin S. Beyer, Vuk Ercegovac, Rajasekar Krishnamurthy, Sriram Raghavan, Jun Rao, Frederick Reiss, Eugene J. Shekita, David E. Simmen, Sandeep Tata, Shivakumar Vaithyanathan, Huaiyu Zhu: Towards a Scalable Enterprise Content Analytics Platform. IEEE Data Eng. Bull. 32(1): 28-35 (2009)
  • Fei Xu, Kevin S. Beyer, Vuk Ercegovac, Peter J. Haas, Eugene J. Shekita: E = MC3: managing uncertain enterprise data in a cluster-computing environment. SIGMOD Conference 2009: 441-454
  • More
Dr. Hamid Pirahesh, IBM Fellow, Senior Manager Data Management

Dr. Hamid Pirahesh
IBM Fellow, Senior Manager, Data Management and Database Technology Institute (DBTI)


    About IBMPrivacyContact