IBM   Almaden Computer Science About Almaden Computer Science Press Careers Home
CS home
  About us
  Ease of use
  Patent Server
Data Compression & Scientific Modeling
Data Compression & Scientific Modeling




Reducing storage barriers and breaking system bottlenecks






Transforming statistical modeling

Data compression and scientific modeling work hand in hand in this vertically integrated project, to synergistically push the envelope in both applied information theory and statistical modeling. From the invention of arithmetic coding for data compression through creation of the Minimum Description Length principle and Stochastic Complexity concepts for statistical modeling, our theoretical work covers from "Foundations to Applications." Simultaneously, we apply these techniques to solve IBM compression and modeling problems from "Algorithms to Implementations."

For superior compression, the secret is in the modeling (and visa versa!!)
Our theories have revolutionized statistical modeling with fundamental new tools to avoid the overfitting of data. Our compression hardware chips enable the world's bank checks to be 'cleared' overnight. Our fast classifier building models have speeded up data mining algorithms tenfold. Our compression software keeps high speed IBM printers smoothly moving paper without a 'hitch'. Our compression algorithms speed Web searches and document transfers on the Internet.

For superior compression, the secret is in the modeling (and visa versa!!) We've pushed back the frontiers of applied information theory with the invention of general-purpose and adaptive arithmetic coding for compression. It reduced the solving of compression problems to merely the creation of optimal models, individually matched for each application. Its introduction of adaptivity also drastically reduced the numbers of compression models to be invented for related applications, introducing the robustness to cover large variations in data. With these and other state-of-the-art techniques, we reduce storage barriers and break system bottlenecks for our products.

Conversely, for superior modeling, our secret comes from viewing it like compression. With the discovery of a duality between the mathematics of information theory and statistical modeling, the Minimum Description Length principle and Stochastic Complexity concepts were invented to avoid the overfitting of data. These techniques introduced a new cost metric of compressibility for a statistical model, or its information in bits. This is a new key to solving statistics problems such as determining optimal model parameter precisions, model orders and even model class selections. With such new techniques, we're transforming statistical modeling for the world.

Contact Ron Arps for more information.

Almaden Home | IBM Research | Legal | Feedback