Skip to main content

Content Analytics Platform

Project Description

With the tremendous growth in the volume of semi-structured and unstructured content within enterprises(e.g., email archives, customer support databases, etc.), there is increasing interest in harnessing this content to power search and business intelligence applications. Traditional enterprise infrastructure for analytics is not designed to meet the demands of large-scale compute-intensive analytics over semi-structured content.

In the CAP project, we are developing an enterprise content analytics platform that leverages the Hadoop map-reduce framework to support this emerging class of analytic workloads. Two core components of the platform are Jaql, a declarative language for expressing transformations over semi-structured data, and SystemT-IE, a high-performance information extraction engine. In addition, we are in the process of building MetaTracker -- a data-centric flow manager, to define, manage, and deploy analytic workflows on this software stack.

Project Contact: Sriram Raghavan

Publications

[an error occurred while processing this directive]