- Data Compression & Scientific Modeling
-
Reducing
storage barriers and breaking system bottlenecks
Transforming
statistical modeling
|
ata
compression and scientific modeling work hand in hand in this vertically
integrated project, to synergistically push the envelope in both
applied information theory and statistical modeling. From the invention
of arithmetic coding for data compression through creation of the
Minimum Description Length principle and Stochastic Complexity concepts
for statistical modeling, our theoretical work covers from "Foundations
to Applications." Simultaneously, we apply these techniques to solve
IBM compression and modeling problems from "Algorithms to Implementations."
For
superior compression, the secret is in the modeling (and visa versa!!)
Our theories
have revolutionized statistical modeling with fundamental new tools
to avoid the overfitting of data. Our compression hardware chips
enable the world's bank checks to be 'cleared' overnight. Our fast
classifier building models have speeded up data mining algorithms
tenfold. Our compression software keeps high speed IBM printers
smoothly moving paper without a 'hitch'. Our compression algorithms
speed Web searches and document transfers on the Internet.
For superior
compression, the secret is in the modeling (and visa versa!!) We've
pushed back the frontiers of applied information theory with the
invention of general-purpose and adaptive arithmetic coding for
compression. It reduced the solving of compression problems to merely
the creation of optimal models, individually matched for each application.
Its introduction of adaptivity also drastically reduced the numbers
of compression models to be invented for related applications, introducing
the robustness to cover large variations in data. With these and
other state-of-the-art techniques, we reduce storage barriers and
break system bottlenecks for our products.
Conversely,
for superior modeling, our secret comes from viewing it like compression.
With the discovery of a duality between the mathematics of information
theory and statistical modeling, the Minimum Description Length
principle and Stochastic Complexity concepts were invented to avoid
the overfitting of data. These techniques introduced a new cost
metric of compressibility for a statistical model, or its information
in bits. This is a new key to solving statistics problems such as
determining optimal model parameter precisions, model orders and
even model class selections. With such new techniques, we're transforming
statistical modeling for the world.
Contact Ron
Arps for more information.
Almaden
Home | IBM
Research | Legal
| Feedback
|