IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

(none)

Activity Report


  Activity Report, January- June 2009
clear image

Highlights

Better Algorithms for Benign Bandits
The online multi-armed bandit problem and its generalizations are repeated decision making problems, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the regret of the algorithm. The term bandit refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information. Perhaps the most general form of this problem is the non-stochastic bandit linear optimization problem, where the set of decisions is a convex set in some Euclidean space, and the cost functions are linear. Only recently an efficient algorithm attaining O(sqrt(T)) regret was discovered in this setting. Elad Hazan and Satyen Kale (Yahoo) propose a new algorithm for the bandit linear optimization problem which obtains a regret bound of O(sqrt(Q)), where Q is the total variation in the cost functions. This regret bound, previously conjectured to hold in the full information case, shows that it is possible to incur much less regret in a slowly changing environment even in the bandit setting. The new algorithm is efficient and applies several new ideas to bandit optimization such as reservoir sampling. The work was published in SODA 2009.

Efficient Sketches for Earth-Mover Distance, with Applications
Computing the earth-mover distance is a fundamental problem in geometric optimization. Together with nearest neighbor search methods, algorithms for computing it have been applied to fast image search in large collections of images. In a joint work Alexandr Andoni, Khanh Do Ba, and Piotr Indyk (all from MIT), David Woodruff provided the first sub-linear sketching algorithm for estimating the planar earth-mover distance with a constant approximation. Namely, for points living in the two-dimensional grid [Δ] x [Δ], an algorithm with space Δε for O(1/ε)-approximation is achieved, for any 0 < ε < 1. The sketch has immediate applications to the streaming and nearest neighbor search problems. This work will appear in FOCS 2009.

Faster Counting of the Number of Inversions in a Permutation
Timothy Chan (U. Waterloo) and Mihai Patrascu give an O(n sqrt(lg n))-time algorithm for counting the number of inversions in a permutation on n elements. This improves a long-standing previous bound of O(n lg n / lglg n) that followed from Dietz's data structure [WADS'89], and answers a question of Andersson and Petersson [SODA'95]. As Dietz's result is known to be optimal for the related dynamic rank problem, this result demonstrates a significant improvement in the offline setting. The technique finds numerous applications to problems in computational geometry. Work in submission.

Hellinger Strikes Back: A Note on the Multi-Party Information Complexity of AND
The AND problem on t bits is a promise decision problem where either at most one bit of the input is set to 1 (NO instance) or all t bits are set to 1 (YES instance). T.S. Jayram gave a new and simplified proof of an Ω(1/t) lower bound on the information complexity of AND in the number-in-hand model of communication. This has applications to proving space lower bounds on estimating the frequency moments and norms. The proof exploits the information geometry of communication protocols via Hellinger distance in a novel manner and avoids the analytic approach inherent in previous work. This paper appeared in RANDOM 2009.

How Hard is it to Approximate the Best Nash Equilibrium?
The quest for a PTAS for Nash equilibrium in a two-player game is a major open question in Algorithmic Game Theory. Elad Hazan and Robert Krauthgamer (Weizmann) show that the optimization version of problem, namely, finding in a two-player game an approximate equilibrium achieving large social welfare is unlikely to have a polynomial time algorithm. Technically, our result is a reduction from a seemingly unrelated yet notoriously difficult problem in modern combinatorics, of finding a planted (but hidden) clique in a random graph. This work appeared in SODA 2009.

Lower bounds for Edit Distance and Product Metrics via Poincare-Type Inequalities
Alexandr Andoni (MIT), T.S. Jayram, and Mihai Patrascu show that any sketching protocol for edit distance achieving a constant approximation requires nearly logarithmic (in the strings' length) communication complexity. This is an exponential improvement over the previous lower bound. From a technical perspective, our contribution was a direct-sum theorem for sketching product metrics: for any metric X that requires sketch size which is a sufficiently large constant, sketching the max-product metric Ld(X) requires Ω(d) bits. The proof uses a novel technique for information complexity based on Poincare inequalities and suggests an intimate connection between non-embeddability, sketching and communication complexity. Work in submission.

On the Communication Complexity of Read-Once AC0 Formulae
T.S. Jayram, joint with Swastik Kopparty (MIT), Prasad Raghavendra (formerly at Univ. of Washington, now at Georgia Tech.) studied the 2-party randomized communication complexity of read-once AC0 formulae. For balanced AND-OR trees with n inputs and depth d, they show that the communication complexity is Ω( n/4d ) while for general AND-OR trees, the communication complexity is n/2Ω(d log d). These results generalize the classical results on the communication complexity of set-disjointness and the tribes functions. Their techniques build on and extend the information complexity methodology for proving lower bounds on randomized communication complexity. This paper appeared in CCC 2009.

On the Possibility of Better SAT Algorithms
Mihai Patrascu and Ryan Williams (IAS) investigate the prospects for an algorithm for Boolean CNF satisfiability (CNF-SAT) that runs in O(2δn) time for some δ<1. In such a case one would say CNF-SAT has an improved algorithm. Several plausible hypotheses are presented. The authors prove that if any of the hypotheses are true, then CNF SAT has an improved algorithm. For other candidate hypotheses, they imply that k-SAT can be solved in 2δn time for a fixed δ<1 that is independent of k. Such results would be breakthroughs in exact algorithms. The contrapositive statements show that if one assumes satisfiability does not admit improved algorithms, then many interesting lower bounds follow. The reductions presented provide tight connections between satisfiability and foundational problems in parameterized complexity, graph theory, communication complexity, and computational geometry. Work in submission.

Online Learning Algorithms for Changing Environments
The standard way of measuring performance for online learning algorithms is to look at the regret. The regret compares the algorithms performance with the best possible fixed decision. This is a good measure of performance when the underlying input distribution is fixed or slowly changing with time. When the underlying distribution makes sudden changes, an algorithm could be performing badly but still give low regret. Elad Hazan and C. Seshadhri define a stronger performance measure called adaptive regret and design algorithms that give much better guarantees than old online learning algorithms. They use a bootstrapping procedure that takes old learning algorithms and outputs a much stronger algorithm. Using some streaming ideas, this conversion can be done efficiently. This work appeared in ICML 2009.

Optimal Rank Bounds for Depth-3 Identities
The problem of identity testing is a central problem in theoretical computer science. Given a circuit that computes a polynomial P, we wish to check if it is identically zero (P has no non-zero coefficients). A major open problem is to get a deterministic black box polynomial time algorithms for this. Such an algorithm cannot see the circuit, and is only allowed to evaluate the polynomial P at various domain points. In recent years, there has been some progress on various restricted versions of this problem. The depth 3 case, where one restricts the depth of this circuit, is still very wide open. With Nitin Saxena, C. Seshadhri made progress on this case, and presented almost efficient black box algorithms. This work resolves a question posed by Dvir and Shpilka (2005) dealing with the ranks of depth-3 identities. On of the interesting features of this result is a toolkit designed to study these circuits that is of independent interest. This work appeared in CCC 2009.

Querying Parse Trees of Stochastic Context-Free Grammars
Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Web-page wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of tree-pattern queries with projection (and child/descendant edges), Benny Kimelfeld and Sara Cohen (Technion) show that query results have a polynomial-size representation and, more importantly, an efficient query-evaluation algorithm is presented.

Reverse Data Exchange: Coping with Nulls
A schema mapping gives a description of how to convert data from one representation (a source schema) to another (a target schema). An inverse of a schema mapping M is intended to "undo'' what M does, thus providing a way to perform "reverse'' data exchange. In recent years, a number of different formalizations of this concept have been introduced and studied. They are all based on the common assumption that the source instances do not contain null values. This assumption on source instances is crucial for obtaining some of the main technical results about these notions, but, at the same time, limits their usefulness, since reverse data exchange naturally leads to source instances that may contain nulls. Ron Fagin, Phokion Kolaitis, Lucian Popa, and Wang-Chiew Tan have developed a new framework for reverse data exchange that supports source instances that may contain nulls, This paper appeared in the 2009 ACM Symposium on Principles of Database Systems (PODS '09).

Robustness of Evolvability
A framework for analyzing the computational capabilities and the limitations of the evolutionary process of random change guided by selection was recently introduced by Valiant (2006). In his framework the process of acquiring a complex functionality is viewed as a constrained form of PAC learning. In addition to the basic definition, a number of natural variants of the evolvability model were introduced by Valiant, and several others have been suggested since then. Understanding the relative power of these variants in terms of the efficiently evolvable function classes they define is one of the main open problems regarding the model. Vitaly Feldman presented several results that collectively demonstrate that the notion of evolvability is robust to a variety of reasonable modifications of the model. His results show that the power of a model of evolvability essentially depends only on the fitness metric used in the model. In particular, he proves that the classes of functions evolvable in Valiant's original model are also evolvable with substantially weaker and simpler assumptions on the mutation algorithm and the selection rule and therefore a wide range of models of evolution are equivalent. Another consequence of his results if that evolvability with the quadratic loss fitness metric (or any other non-linear loss) is strictly stronger than evolvability with the linear loss fitness metric used in Valiant's basic definition. The results of this work were presented at the Conference on Learning Theory (COLT), 2009.

Sorting and Selection with Imprecise Comparisons
In this work sorting and selection are considered in the the following model of imprecise comparisons: there exists some δ>0 such that when two elements are compared, if the values of those elements differ by at least δ, then the comparison will be made correctly; when the two elements have values that are within δ, the outcome of the comparison is unpredictable. In this model, the standard method of paired comparisons (Thurstone, 1927) minimizes the errors introduced by the imprecise comparisons at the cost of n(n-1)/2 comparisons. Miklos Ajtai, Vitaly Feldman, Avinatan Hassidim (MIT) and Jelani Nelson showed that the same optimal guarantees can be achieved using 4n3/2 comparisons, and proved the optimality of their method. They then explored the general tradeoff between the guarantees on the error that can be made and number of comparisons for the problems of sorting, max-finding, and selection. Their results provide close-to-optimal solutions for each of these problems. The results of this work were presented at International Colloquium d on Automata, Languages and Programming (ICALP), 2009.

The Rank Problem in Succinct Data Structures
The rank problem in succinct data structures is to preprocess an array A[1..n] of bits into a data structure using as close to n bits as possible, and answer queries of the form rank(k) = ∑ i=1..k A[i]. The problem has been intensely studied, and features as a subroutine in a majority of succinct data structures. Mihai Patrascu showed that in the cell probe model with w-bit cells, if rank takes t time, the space of the data structure must be at least n + n/wO(t) bits. This redundancy/query trade-off is essentially optimal, matching the author's upper bound from [FOCS'08]. Work in submission.

Awards and Honors

P. Kolaitis received the best paper award in the 12th International Conference on Database Theory (ICDT, 2009). Joint work with Foto Afrati.

P. Kolaitis was a Plumer Visiting Research Fellow at St. Anne's College, University of Oxford, January-March, 2009.

E. Hazan was invited to give a talk at the 2nd CompView Fall School on Computational View to Learning Theory, September 17-19, 2009, in the vicinity of Tokyo, Japan.


Publications and Conferences

Foto N. Afrati, Rada Chirkova, Manolis Gergatsoulis, Benny Kimelfeld, Vassia Pavlaki, Yehoshua Sagiv: On rewriting XPath queries using views. EDBT 2009, pp. 168-179.

Foto Afrati and Phokion G. Kolaitis, "Repair Checking in Inconsistent Databases: Algorithms and Complexity", 12th International Conference on Database Theory (ICDT 2009), pp, 31-41, 2009.

Miklos Ajtai, Vitaly Feldman, Avinatan Hassidim and Jelani Nelson. Sorting and Selection with Imprecise Comparisons. Proceedings of the International d Colloquium on Automata, Languages and Programming (ICALP), A. 2009.

A. Bhattacharyya, E. Grigorescu, K. Jung, S. Raskhodnikova, and D. Woodruff, "Transitive-closure spanners", SODA, 2009, pp.932-941.

Balder ten Cate and Phokion G. Kolaitis, "Structural Characterizations of Schema Mappings", 12th International Conference on Database Theory (ICDT 2009), pp, 63--72.

Chandra Chekuri, Kenneth L. Clarkson, and Sariel Har-Peled: "On the set multi-cover problem in geometric settings", in SoCG '09: Proceedings of the Twenty-Fifth Annual Symposium on Computational Geometry.

K. Clarkson and D. Woodruff, "Numerical Linear Algebra in the Streaming Model", STOC, 2009, pp. 205-214.

Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, Wang-Chiew Tan, "Reverse Data Exchange: Coping with Nulls", Twenty-Eighth ACM Symposium on Principles of Database Systems (PODS 2009), pp. 23--32.

Sara Cohen, Benny Kimelfeld, Yehoshua Sagiv: Running tree automata on probabilistic XML. PODS 2009, pp. 227-236.

Vitaly Feldman. On The Power of Membership Queries in Agnostic Learning. Journal of Machine Learning Research, 10(Feb):163--182, 2009.

Vitaly Feldman. Robustness of Evolvability. Proceedings of the Conference on Learning Theory (COLT), 2009.

Vitaly Feldman and Shrenik Shah. Separating Models of Learning with Faulty Teachers. Theoretical Computer Science journal, 410(19), 2009, pp. 1903-1912 (Special issue on ALT 2007).

Elad Hazan and Satyen Kale, "Better Algorithms for Benign Bandits", SODA, 2009.

Elad Hazan and Robert Krauthgamer, "How Hard is it to Approximate the Best Nash Equilibrium", SODA, 2009.

Elad Hazan and C. Seshadhri, "Efficient learning algorithms for changing environments". In Proc. 26th ICML 2009.

Parikshit Gopalan, Phokion G. Kolaitis, Elitza N. Maneva, Christos H. Papadimitriou, "The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies", SIAM J. of Computing, vol. 38, no. 6, pp. 2330--2355, 2009.

Benny Kimelfeld, Yehoshua Sagiv, Gidi Weber: ExQueX: exploring and querying XML documents. SIGMOD Conference 2009, pp. 1103-1106.

Phokion G. Kolaitis and Swastik Kopparty, "Random Graphs and the Parity Quantifier", 41st Annual ACM Symposium on Theory of Computing (STOC 2009), pp. 705-714.

N. Saxena and C. Seshadhri: "An Almost Optimal Rank Bound for Depth-3 Identities". In Proc. 24th CCC 2009.

D. Woodruff, "The average-case complexity of counting distinct elements", ICDT, 2009, pp. 284-295.

Patents and Patent Applications

A. Amir and N. Megiddo. METHOD FOR CREATING AND MAINTAINING THREADS OF PHONE/EMAIL/FAX/SMS CONVERSATIONS, U.S. 7,539,295, 5/26/2009.

D. Dash, G.M. Lohman, N. Megiddo, and J. Rao. COMPUTER AUTOMATED DISCOVERY OF INTERESTINGNESS IN FACETED SEARCH, U.S. 7,493,319, 2/17/2009.

R. Fagin, R. Guha, Ph. Kolaitis, R. Kumar, J. Novak, D. Sivakumar, and A. Tomkins, System and Method for Performing a High-Level Multi-Dimensional Query on a Multi-Structural Database, Issued as Patent 7519582 in U.S.04/14/2009.

P. Haas, M. Kutsch, V. Markl, and N. Megiddo. CONSISTENT AND UNBIASED CARDINALITY ESTIMATION FOR COMPLEX QUERIES WITH CONJUNCTS OF PREDICATES, U.S. 7,512,629, 3/31/2009.

P. Hass, V. Markl, and N. Megiddo. CONSISTENT HISTOGRAM MAINTENANCE USING QUERY FEEDBACK, U.S. 7,512,574, 3/31/2009.

N. Megiddo. SYSTEM, METHOD AND PROGRAM PRODUCT FOR AUTOMATICALLY MANAGING CONTRACTS, U.S. 7,480,621, 1/20/2009.

Presentations and Seminars


Transitions
  • Mihai Patrascu finished his postdoc under the Raviv Fellowship, Summer, 2009.
Organization
  • Ken Clarkson, Senior Manager, CS Principles and Methodologies
  • Stella Ayala (Administrative Assistant)
  • Ron Fagin, Manager, Foundations of Computer Science
  • T. S. Jayram, Manager, Algorithms and Computation
  • Miki Ajtai
  • Vitaly Feldman
  • Elad Hazan
  • Benny Kimelfeld (postdoctoral scholar)
  • Phokion Kolaitis
  • Nimrod Megiddo
  • Mihai Patrascu (Raviv postdoctoral scholar)
  • C. Seshadhri (postdoctoral scholar)
  • David Woodruff


    About IBMPrivacyContact

    About IBMPrivacyContact