IBM

Algorithms and Complexity

New Models in the WWW

Traditional random graph models cannot explain various phenomenon (degree distributions, clique counts, etc.) that are manifest in massive graphs like the World-Wide Web. A recent paper (VLDB '99) and proposes a new random graph model for such graphs. Earlier versions of the model appeared in this paper (COCOON '99). A more recent paper analyzes some properties of random graphs generated by this model.

How does one model the browsing activity on the WWW, which incorporates the use of both hyperlinks and `back buttons'? A recent paper proposes and analyzes a new random walk model with the notion of such backoff processes.

Geometric Algorithms

An important problem in VLSI design is distributing a clock signal to synchronous elements in a VLSI circuit so that the signal, distributed by means of a clock routing tree, arrives at all elements simultaneously. The problem is to construct such a clock tree with minimal sum of edge lengths. A recent paper (SODA '99) give the first constant-factor approximation algorithms for this problem and some variants.

A recent paper (SODA '99) gives a (3/2 + e)-factor approximation algorithm (for any e > 0) for the metric Steiner tree problem for the special case of quasi-bipartite graphs. These are graphs that do not have edges connecting pairs of Steiner vertices.

In the k-medians problem, given a set S of n points in a metric space and a positive integer k, the goal is to locate k medians in the space such that the sum of the distances from each of the points of S to the nearest median is minimized. A paper (STOC '98) obtains the first polynomial time approximation schemes for the Euclidean k-medians problem.

Data Mining

Consider two user populations, of which one is targeted and the other is not, both Markovian. Each time a user arrives at a state, he/she is presented with an advertisement appropriate for the targeted population with some probability. Presenting the advertisement incurs a cost. While the revenue grows in proportion to the flow of targeted users through the state, the cost grows in proportion to the total flow (targeted and untargeted) through the state. A recent paper (STOC '99) studies optimal policies for the simple problem above.

Recommendation systems track past actions of a group of users to make recommendations to individual members of the group. The growth of computer-mediated marketing and commerce has led to increased interest in such systems. A paper (FOCS '98) introduces a simple analytical framework for recommendation systems, including a basis for defining the utility of such a system and some algorithmic methods.

Segmentation problems are a novel genre of optimization problems that are motivated by certain aspects of clustering and data mining. A paper (STOC '98) introduces this model and devises approximation algorithms for a number of such problems.

Data Processing

Classical information theory assumes that a bit can exist in one of two states, say, 0 and 1. In contrast, a quantum information bit can exist in a superposition of quantum states. The problem of compression is central to the storage and transmission of quantum data. A recent paper (IEEE IT'00) (joint work with Science and Technology ) studies the problem of compressing a sequence of symbols emitted by a memoryless quantum Bernoulli source. It proposes computationally efficient, reversible quantum algorithms to compress the quantum sequence to its theoretical limit, namely, the von Neumann entropy.

Reliability of various storage and transmission channels, such as magnetic/optical disk drives, can be substantially improved by matching the characteristics of the recorded or transmitted signals to those of the channel. Constrained codes are used to convert an arbitrary source signal to a coded signal that satisfies a given channel constraint. Group members have classified the computational complexities of several fundamental problems in the field of constrained block coding. For example, certain problems of minimizing the size of logic circuits for block encoders and decoders are shown to be complete in the second and third levels of the polynomial-time hierarchy. The apparent intractability of these problems does not diminish the practical need to solve them. A recent paper (IEEE JSAC '00) (joint work with Storage Systems and Technology proposes heuristic algorithms for the problems of minimizing the size of logic circuits for block encoders and decoders.

Delta compression is the algorithmic task of finding common strings between versions of data and using the commonality between versions to compactly encode one version by describing it as a set of changes from its companion. Delta compression can be applied, for example, to reduce the network bandwidth required for backup/restore and for software updating. In work with the Storage Systems department, group members have invented new techniques and algorithms with good compression performance for this problem. A paper is available. These algorithms are the basis for the Adaptive Differencing Technology that is now being incorporated in the Tivoli Storage Manager product. (See also.)

Online Algorithms

Consider a scenario where there are costs associated with each input of an n input function. The objective is to devise algorithms that learn the value of the function by sequentially querying the inputs, while minimizing the cost. A recent paper investigates this model and provides competitive algorithms to evaluate Boolean functions and others.

Online scheduling and resource allocation problems also arise in the context of parallel database and network servers. A paper (SODA '98) contains algorithms with a new notion of fairness that minimizes the maximum slowdown factor of a job because of other jobs competing for resources. Another paper (JPDC '98) provides algorithms for sharing resources at a given time, such as allocating a fraction of processors to each competing job.

In the area of storage systems, problems often arise because of the unpredictable nature of I/O requests. A paper (HPCS '98) introduces and evaluates a new algorithm for garbage collection in log-structured disk arrays and file systems.

Computational Models

Group members have devised algorithms that need to make only a single (or a small number of) pass(es) over a data set. Here is a paper (SIGMOD '98) that addresses one-pass quantile estimation and another paper (SIGMOD '99) that address space efficiency issues. The complexity issues are addressed in another paper.

Checking computations/properties via careful random sampling is a quick and useful paradigm for ensuring the correctness of programs. The problem of checking the associativity of tables is considered in a paper (FOCS '96), where a randomized quadratic-time algorithm is given.

The problem of whether a prover can aid a verifier to reliably compute a function faster than if the verifier were to compute the function on its own is considered in a recent paper (STOC '99). It is sufficient for the verifier to know that the answer is close to correct. The model used is based on interactive proof systems, probabilistically checkable proofs, program checkers, and CS proofs. The model of spot-checking is introduced in a paper (STOC '98) where checkers in this model for some basic problems are given.

Complexity Theory

Boolean branching programs are simple, yet powerful models of computation. A recent paper (FOCS '99) obtains a non-linear time lower bound for the size of boolean branching programs computing an explicit function. A fundamental issue in complexity theory is the understanding of determinism vs. non-determinism. A recent paper (STOC '99) shows that the latter is strictly more powerful than the former for a reasonably general model of computation.

Lattices play an important role in many algebraic and combinatorial algorithms. A long standing open problem has been to understand the complexity of computing the shortest non-zero vector in a lattice. A paper (STOC '98) resolves this question by showing that the shortest vector problem in L2 is NP-hard for randomized reductions. Another paper (STRUCTURES '99) shows that the uniqueness version of this problem is also NP-hard. Techniques for generating hard instances of the short basis problem will be presented as an invited talk (ICALP '99).

Descriptive complexity is the complexity of describing problems in some logical formalism. There is often a close connection between the descriptive complexity of a problem and its computational complexity. A paper (STOC '98) on the topic discusses the closure of monadic NP .


[ IBM Research | IBM home page | Order | Search | Contact IBM | Help | (C) | (TM) ]