IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
[an error occurred while processing this directive]
 Select a country
 IBM Research Home
Almaden Research
About Us
Visitor Information
Almaden Projects
IBM Research News
Career Opportunities
Feedback

 
 


IBM Research - Almaden
Data Compression & Scientific Modeling

Past Project
Data compression and scientific modeling work hand in hand in this vertically integrated project, to synergistically push the envelope in both applied information theory and statistical modeling. From the invention of arithmetic coding for data compression through creation of the Minimum Description Length principle and Stochastic Complexity concepts for statistical modeling, our theoretical work covers from "Foundations to Applications." Simultaneously, we apply these techniques to solve IBM compression and modeling problems from "Algorithms to Implementations."

For superior compression, the secret is in the modeling (and visa versa!!)
Our theories have revolutionized statistical modeling with fundamental new tools to avoid the overfitting of data. Our compression hardware chips enable the world's bank checks to be 'cleared' overnight. Our fast classifier building models have speeded up data mining algorithms tenfold. Our compression software keeps high speed IBM printers smoothly moving paper without a 'hitch'. Our compression algorithms speed Web searches and document transfers on the Internet.

For superior compression, the secret is in the modeling (and visa versa!!) We've pushed back the frontiers of applied information theory with the invention of general-purpose and adaptive arithmetic coding for compression. It reduced the solving of compression problems to merely the creation of optimal models, individually matched for each application. Its introduction of adaptivity also drastically reduced the numbers of compression models to be invented for related applications, introducing the robustness to cover large variations in data. With these and other state-of-the-art techniques, we reduce storage barriers and break system bottlenecks for our products.

Conversely, for superior modeling, our secret comes from viewing it like compression. With the discovery of a duality between the mathematics of information theory and statistical modeling, the Minimum Description Length principle and Stochastic Complexity concepts were invented to avoid the overfitting of data. These techniques introduced a new cost metric of compressibility for a statistical model, or its information in bits. This is a new key to solving statistics problems such as determining optimal model parameter precisions, model orders and even model class selections. With such new techniques, we're transforming statistical modeling for the world.

   
Link to content List of Almaden Projects

  About IBM  |  Privacy  |  Terms of use  |  Contact