In this project, we study delta compression, the algorithmic task of
finding common strings between versions of data and using the commonality
between versions to compactly encode one version by describing it
as a set of changes from its companion. Previous differencing algorithms
run in O(n^2) time and O(n) space.
We have developed
a new algorithm that runs in O(n) time and O(1) space that closely
approximates the compression performance of the previous algorithms.
We have also developed two additional algorithms based on heuristics
which improve the compression performance of the O(n) algorithm
and the runtime performance of the O(n^2) algorithm.
Our treatment
includes theoretical results that limit the compression power of
algorithms under natural performance assumptions. All of our algorithms
for differencing operate at a fine granularity and make no assumptions
about the format or alignment of input data. We have also explored
their performance properties experimentally.
In addition, we have
developed an algorithm for modifying delta compressed files so that
the compressed versions may be reconstructed without scratch space.
This allows network clients with limited resources to efficiently
update software by retrieving delta compressed versions over a network.
Delta compression
for binary files (compactly encoding a version of data with only
the changed bytes from a previous version) may be used to efficiently
distribute software over low bandwidth channels, such as the Internet.
Traditional methods for rebuilding these delta files require memory
or storage space on the target machine for both the old and new
versions of the file to be reconstructed.
With the advent
of network computing and Internetenabled devices, many of these
network attached target machines have limited additional scratch
space. We have developed an algorithm for modifying a delta compressed
version file so that it may rebuild the new file version in the
space that the current version occupies.
