Overview
Enterprises generate tremendous volumes of data from internal sources such as transaction systems, web logs, product tracking information, and customer online correspondence. They also utilize a great deal of information on customer demographics, competitors, public sentiment and more.
Platforms such as Hadoop, an Apache open source project, have been designed to store web-scale data and support complex web analytics programmed using the Map-Reduce paradigm. We are exploring the use of Hadoop, with important extensions, as an enterprise platform for extreme, enterprise analytics - that is, extremely complex analytics on extremely large volumes of data.
Our goal is to build a powerful analytics platform, and to use it to create analytic applications providing solutions to problems that have not been economically feasible to solve until now.
Project Contact: John McPherson
- Kevin S. Beyer, Vuk Ercegovac, Rajasekar Krishnamurthy, Sriram Raghavan, Jun Rao, Frederick Reiss, Eugene J. Shekita, David E. Simmen, Sandeep Tata, Shivakumar Vaithyanathan, Huaiyu Zhu: Towards a Scalable Enterprise Content Analytics Platform. IEEE Data Eng. Bull. 32(1): 28-35 (2009)
- Andrey Balmin, Latha S. Colby, Emiran Curtmola, Quanzhong Li, Fatma Ozcan: Search Driven Analysis of Heterogenous XML Data. CIDR 2009
- Wensheng Wu, Berthold Reinwald, Yannis Sismanis, Rajesh Manjrekar: Discovering topical structures of databases. SIGMOD Conference 2008: 1019-1030

