Skip to main content

Infrastructure for Intelligent Information Systems

Overview

Databases must do more than simply store and process the increasing amount of data in our world. They must also effectively organize and streamline the data to best aid users.

The Infrastructure for Intelligent Information Systems (IIIS) group is at the forefront of research into data analytics, search technologies, and integration. Our research ranges from generating innovative information extraction techniques to analyzing data translation to optimizing schema mapping, with the ultimate goal of developing next-generation, responsive information systems. Throughout all of our work, emphasis is placed on adaptability, usability, and scalability.

Featured Projects

[SystemT logo] SystemT
The SystemT project is an amalgam of two major research themes centered around analytics and search over unstructured content. These two themes are represented by two corresponding sub-projects: SystemT-Information Extraction (SystemT-IE) and SystemT-Programmable Search (SPS). Our main project page describes both sub-projects in greater detail.

Project leader: Howard Ho

Content Analytics Platform (CAP)
With the tremendous growth in the volume of semi-structured and unstructured content within enterprises(e.g., email archives, customer support databases, etc.), there is increasing interest in harnessing this content to power search and business intelligence applications. Traditional enterprise infrastructure for analytics is not designed to meet the demands of large-scale compute-intensive analytics over semi-structured content. In the CAP project, we are developing an enterprise content analytics platform that leverages the Hadoop map-reduce framework to support this emerging class of analytic workloads.

Project leader: Sriram Raghavan


Gumshoe
In contrast with the radical advances in Web search over the last several years, search over enterprise intranets has remained a difficult and largely unsolved problem. There are some critical enterprise-specific factors that differentiate the search problem on the intranet from that on the Web: Gumshoe brings together and leverages our work with SystemT and the Content Analytics Platform to build a scalable end-to-end intranet search engine that addresses the above challenges. Gumshoe has been in development over the past two years and was recently deployed as a company-wide internal pilot.

Project leader: Sriram Raghavan

[Midas logo] Midas
In past years, the Information Integration Group has focused on developing a declarative framework for schema mappings to facilitate integration of data from multiple heterogeneous sources. Recently, we have extended the scope of our information integration research and started a new research project called Midas, which aims at utilizing the vast amount of publicly available information that is present in both structured and unstructured formats. To illustrate the capabilities we would like to provide, consider the following questions that might be relevant to a regulator or to an investor: "Are there any violations of the Clayton Antitrust act (two competing corporations having a common director)?", "How much assistance have publicly traded companies obtained under the TARP program?", "What are the financial details of banks on the FDIC troubled bank list?" Answering such questions requires: The aggregation of the entities from the extracted facts requires techniques for: data cleansing and normalization, entity resolution, schema mapping and data fusion, and temporal analysis. Finally, the requirements we need to support include scalability with large amounts of data and continuous updates. To this extent, our architecture leverages the Cloud infrastruct

Project leader: Rajasekar Krishnamurthy

SystemML
There are many small systems for in-memory/in-core analysis of datasets in Mbytes or GBytes range, running on a single machine. However, there is a pervasive need to enable machine learning (ML) on big data. In SystemML, we address the challenges of large-scale analytics, namely: big data to TBytes and PBytes, scalability to large clusters with 1,000s of nodes, productivity of data analysts by providing a higher-level language, and optimization of execution strategies for varying data sets and system configurations.

Project leader: Berthold Rainwald

Recent Publications

Dr. Shivakumar Vaithyanathan

Dr. Shivakumar Vaithyanathan, Senior Manager, Infrastructure for Intelligent Information Systems

Senior Manager Infrastructure for Intelligent Information Systems

[an error occurred while processing this directive]