IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

(none)

Activity Report


  Activity Report, July - December 2008
clear image

Highlights

Cascaded Aggregates on Data Streams
The problem of computing cascaded (or multi-level) aggregates is quite basic and is extremely useful in several applications where grouping by some attribute is involved. For example, to compute the average stock market volatility, we have to group the stock market data by the stock symbol, compute the variance of the "logarithmic return on investment" values recorded for that stock, and then compute the average of these values over all stocks. Other examples from business intelligence as well as science include analyzing IP traffic, credit card fraud, low-rank approximation etc. David Woodruff and T.S. Jayram have obtained algorithms for computing specific types of cascaded aggregates, and in particular, their algorithm estimates combinations of frequency moments.

Experience-Induced Neural Circuits That Achieve High Capacity
Over a lifetime, the cortex performs a vast number of different cognitive actions, mostly dependent on past experience. Previously it has not been known how such capabilities can be reconciled, even in principle, with the known resource constraints on the cortex, such as low connectivity and low average synaptic strength. In joint work with Leslie Valiant, Vitaly Feldman described neural circuits and associated algorithms that respect the brain's most basic resource constraints and support the execution of high numbers of cognitive actions when presented with natural inputs. Their circuits simultaneously support a suite of four basic kinds of task that each require some circuit modification: hierarchical memory formation, pairwise association, supervised memorization, and inductive learning of threshold functions. The capacity of the circuits is established via experiments in which sequences of several thousands such actions are simulated by computer and the circuits created tested for subsequent efficacy. Their underlying theory is apparently the only biologically plausible systems-level theory of learning and memory in cortex for which such a demonstration has been performed. This work has been accepted for publication in "Neural Computation" (MIT Press).

On the Multi-Cover Problem in Geometric Settings
Working with Chandra Chekuri and Sariel Har-Peled of the University of Illinois, Ken Clarkson extended algorithms for geometric set cover to the case of geometric multi-cover. Given a set of points P and a collection of geometric shapes F, the problem is to find a minimum cardinality subset of F such that every point in P is covered by (contained in) at least D(p) sets. Here D(p) is an integer demand requirement for p. When the demands D(p)=1 for all p, this the standard set cover problem. The set cover problem in geometric settings admits a polynomial-time approximation algorithm with an approximation ratio that is better than that for the general version. This paper shows that similar improvements can be obtained for the multi-cover problem as well. In particular, we obtain an O(log Opt) approximation for set systems of bounded VC dimension, and an O(1) approximation for covering points by half-spaces in three dimensions. This work will appear at the SoCG 2009, the Symposium on Computational Geometry.

Running Tree Automata on Probabilistic XML
Tree automata (specifically, bottom-up unranked NTA) form a powerful query language over XML documents, because they are as expressive as monadic second-order logic (MSO). The problem of evaluating tree automata over probabilistic XML is investigated. Towards this end, a new model of probabilistic XML is proposed. Its novelty is in allowing order among siblings and featuring double exponential compaction, namely, the probability space of possible worlds can be double exponential in the size of its description in terms of a probabilistic XML document. Evaluating a tree automaton A amounts to computing the probability that A accepts a random possible world. It is shown that FDTA (fully deterministic tree automata) can be evaluated in polynomial time, even under combined complexity. Hence, an NTA can be evaluated in time that is polynomial in the probabilistic XML document and exponential in the NTA, and the general case of evaluating MSO is fixed-parameter tractable. Tree automata can also express constraints (e.g., attribute-free DTDs). When the probabilistic XML document is conditioned on satisfying (constraints expressed by) an FDTA, the above complexity results still hold. Moreover, in the case of evaluating a twig query Q, the running time is exponential in Q (the FDTA is part of the input). Finally, it is shown that sampling, conditioned on satisfying an FDTA, can be efficiently performed. To appear in PODS, 2009. Joint work with Sara Cohen and Yehoshua Sagiv.

The Average Case Complexity of Counting Distinct Elements
The problem of probabilistically estimating the number of distinct elements in a data stream is one of the central problems in data stream computing. To estimate this quantity to within a relative error ε, it is known that if the stream is adversarially ordered, or even randomly ordered, any such algorithm must use Ω(1/ε2) bits of space. This can be prohibitive in practice. In an attempt to bypass this lower bound, David Woodruff studied the average-case complexity of the problem when the streaming data comes from certain distributions. For some natural distributions corresponding to random uncorrelated data, he gives an algorithm with space complexity that beats the lower bound for adversarial and randomly-ordered streams. Nevertheless, for other natural distributions, he shows that the Ω(1/ε2) lower bound still holds when the data is drawn from such distributions. This work will appear in ICDT, 2009.

The Geometry of Binary Search Trees
The splay conjecture is one of the longest standing open problems in data structures (1983). It states that on any sequence of operations, splay trees are at most a constant factor worse than the best binary tree designed with that sequence in mind. If true, this would mean that we can use splay trees as the universal optimal solution, without trying to develop special purpose binary trees optimized for our particular data. Together with Erik Demaine, Dion Harmon, John Iacono, and Daniel Kane, Mihai Patrascu gave an appealing geometric interpretation of binary trees with several consequences: (1) previous lower bounds are easier to understand in this view; (2) discovery of a new lower bound that is at least as good as existing ones, and optimal within a natural class; (3) a previous approximation algorithm that was conjectured to give constant approximation becomes an online tree. This is the first alternative to splay trees that is conjectured to be optimal. This work appeared in SODA 2009.

Transitive-Closure Spanners
Given a directed graph G and an integer k, a k-transitive-closure spanner (k-TC-spanner) of G is a directed graph H that has (1) the same transitive closure of G, and (2) diameter at most k. These spanners were implicitly studied in access control, data structures, and property testing, and properties of these spanners have been rediscovered over the span of 20 years. The main goal in each of these applications is to obtain the sparsest k-TC-spanners. Together with Arnab Bhattacharyya, Elena Grigorescu, Kyomin Jung, and Sofya Raskhodnikova, David Woodruff resolved several open questions concerning the size of sparsest k-TC-spanners and the complexity of computing a k-TC-spanner for a given input graph. As a result, the work greatly improves the query complexity of monotonicity testers of H-minor free graphs, as well as space/time tradeoffs in various access control schemes. The work also gives the first sublinear approximation algorithm for the more general problem of computing a directed k-spanner, thereby resolving the main open question of Elkin and Peleg (IPCO, 2001). This work will appear in SODA, 2009.

Awards and Honors

K. Clarkson was elected an ACM Fellow in 2008.

T.S. Jayram won an IBM Outstanding Innovation Award for "Processing Large Data Sets: Algorithms and Computational Limits".

N. Megiddo received the 16th Plateau Invention Achievement award, 11/28/08.

N. Megiddo received the Invention achievement awards, 11/28/08, 8/29/08, 7/31/08.

N. Megiddo received the Patent Issue award, 8/29/08.

M. Patrascu received the Machtey Award for Best Student Paper in the 49th IEEE Symposium on the Foundations of Computer Science (FOCS'08).

D. Woodruff was invited to visit and perform ongoing research at the University of Bonn for a week.


Publications and Conferences

Miklos Ajtai: Optimal lower bounds for the Korkine-Zolotareff parameters. Theory of Computing, Vol. 4, No.1, pp. 21-51.

Miklos Ajtai: Representing Hard Lattices with O(n log n) bits. Chicago Journal of Theoretical Computer Science, Vol. 2008, Article 4, pp. 1-55.

Alexandr Andoni, Dorian Croitoru, and Mihai Patrascu: Hardness of Nearest-Neighbor Search under L-infinity. In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS'08), pages 424-433.

Timothy Chan and Mihai Patrascu: Point Location in Sublogarithmic Time and Other Transdichotomous Results in Computational Geometry SIAM Journal on Computing, accepted. Special issue for FOCS'06.

Timothy Chan, Mihai Patrascu, and Liam Roditty: Dynamic Connectivity: Connecting to Networks and Geometry. In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS'08), pages 95-104.

Sara Cohen, Benny Kimelfeld and Yehoshua Sagiv: Generating All Maximal Induced Subgraphs for Hereditary and Connected-Hereditary Graph Properties. In Journal of Computer and System Sciences (JCSS), Vol 74, Issue 7 (November 2008), pages 1147-1159.

N. Creignou, Ph.G. Kolaitis, and B. Zanuttini: Structure identification of Boolean relations and plain bases for co-clones. Journal of Computer and System Sciences, 74, 2008, pp. 1003-1115.

D. Dash, J. Rao, N. Megiddo, A. Ailamaki and G. Lohman: Dynamic Faceted Search for Discovery-Driven Analysis. In Proceedings of the ACM 17th Conference on Information and Knowledge Management (CIKM 2008, Napa, CA, October 26 - 30, 2008).

Vitaly Feldman: Statistical Query Learning. Invited article in Encyclopedia of Algorithms. Springer Verlag 2008, pp 894-897.

Vitaly Feldman: Hardness of Proper Learning. Invited article in Encyclopedia of Algorithms. Springer Verlag 2008, pp 385-388.

Vitaly Feldman: Hardness of Approximate Two-level Logic Minimization and PAC Learning with Membership Queries. Journal of Computer and System Sciences 75(1), 2009 (special issue on Learning Theory 2005), pp. 13-26

T. S. Jayram, Ravi Kumar, D. Sivakumar: The One-Way Communication Complexity of Hamming Distance. Theory of Computing 4(1): 129-135 (2008).

H. Jin, J. Lotspiech, and N. Megiddo: Efficient Coalition Detection in Traitor Tracing. In Proceedings of the IFIP TC11 23rd International Information Security Conference (SEC 2008, Milan, Italy, September 8-10, 2008), S. Cimato and S. Jajodia and P. Samarati, Eds.,  pp. 365-380, Springer, 2008.

S. Kale, Y. Peres, C. Seshadhri: Noise Tolerance of Expanders and Sublinear Expander Reconstruction. In Proc. 49th IEEE FOCS 2008, 719-728

Benny Kimelfeld, Yehoshua Sagiv: Modeling and Querying Probabilistic XML Data. In ACM SIGMOD Record, Database Principles column (December 2008).

H.F. Korth, Ph. A. Bernstein, M.F. Fernandez, L. Gruenwald, Ph.G. Kolaitis, K.S. McKinley, and M.T.Ozsu: Paper and proposal reviews: is the process flawed? SIGMOD Record, 37 (3), 2008, pp. 36-39.

Mihai Patrascu: (Data) STRUCTURES. In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS'08), pages 434-443. Invited to the special issue of SIAM Journal on Computing.

Mihai Patrascu: Succincter. In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS'08), pages 305-313. Received the Machtey Award for Best Student Paper.

Mihai Patrascu and Mikkel Thorup: Higher Lower Bounds for Near-Neighbor and Further Rich Problems. SIAM Journal on Computing, accepted. Special issue for FOCS'06.



Patents and Patent Applications

A. Amir and N. Megiddo, "A Method For Creating and Maintaining Threads of Phone/Email/Fax/SMS Conversations", ARC920080036US1, filed 7/21/08.

D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y.S. Maarek, and A. Soffer, "Lossy index compression," issued as Patent 7356527 in US, 4/8/08, issued as Patent 4080878 in Japan, 2/15/08.

L. Chiticariu, P. Kolaitis, and L. Popa, "Interactive Generation of Integrated Schema", ARC820070184, filed July, 2008.

R. Fagin, P. Kolaitis, L. Popa, and W-C. Tan, "Quasi-inverses of schema mappings," filed as Docket ARC920070055US2 in US, 4/3/08.



Presentations and Seminars

  • M. Ajtai
    First-order definable bijections and the notion of cardinality
    Plenary talk at Logic Colloquium 2008, the European Summer Meeting of the Association for Symbolic Logic, Bern, Switzerland, July 2008.
  • K. Clarkson
    Tighter Bounds for Random Projections of Manifolds
    Allerton Conference on Communication, Control, and Computing, Sept. 2008.
  • K. Clarkson
    Metric Spaces, Nets, and Applications
    UC Irvine Computer Science Dept. Distinguished Lecturer Series, October 10, 2008.
  • K. Clarkson
    Self-Improving Algorithms for Delaunay Triangulations
    UC Irvine Theory Seminar, October 10, 2008.
  • P. Kolaitis
    Foundations and Applications of Schema Mappings
    Invited Tutorial, 2008 Workshop on Logic and Algorithms (LAA 2008), University of Edinburgh, UK, July 2008.
  • P. Kolaitis
    Computer Science Workshops: Blurring the distinction between conferences and workshops
    Invited Panel Presentation, Panel on "Paper and Proposal Reviews: Is the Process Flawed?"
  • P. Kolaitis
    Answering Aggregate Queries in Data Exchange
    Invited Talk, Computer Science Department, Free University of Bolzano, Bolzano, Italy, October 2008.
  • M. Patrascu
    (Data) Structures
    SUNY Buffalo, December, 2008.
  • M. Patrascu
    Lower Bounds for the Fourier Transform and other Linear Problems
    Berkeley Algorithmists' Meetings, November, 2008.
  • M. Patrascu
    Orthogonal Range Queries
    Berkeley Algorithmists' Meetings, November, 2008.
  • D. Woodruff
    Transitive-Closure Spanners
    University of Bonn, September, 2008.
  • D. Woodruff
    Revisiting Norm Estimation in a Data Stream
    Dagstuhl workshop on sublinear algorithms, August, 2008.

Committee Work

  • K. Clarkson
    Program committee member, SODA, 2009.
  • K. Clarkson
    Program committee member, ALENEX, 2008.
  • R. Fagin
    Program committee chair, International Conference on Database Theory (ICDT), 2009.
  • E. Hazan
    Program committee member, AISTAT, 2009.
  • E. Hazan
    Program committee member, COLT, 2009.
  • T.S. Jayram
    Program committee member, International Conference on Scalable Uncertainty Management (SUM), 2008.
  • B. Kimelfeld
    Program committee member, International World Wide Web Conferences (WWW), 2009.
  • B. Kimelfeld
    Program committee member, International Conference on Data Engineering (ICDE), 2009.
  • P. Kolaitis
    Program committee member, 7th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), 2008.
  • P. Kolaitis
    Program committee member, 25th International Conference on Data Engineering (ICDE), 2009.
  • N. Megiddo
    Program cluster chair, INFORMS National Meeting, 2008.

Editorships


Books Edited

  • P. Kolaitis
    Complexity of Constraints : An Overview of Current Research Themes, N. Creignou, Ph.G. Kolaitis, H. Vollmer (Eds.), Lecture Notes in Computer Science 5250, Springer, 2008.

External Positions

  • M. Ajtai
    Honorary Research Fellow of the Renyi Institute of The Hungarian Academy of Sciences.
  • P. Kolaitis
    Foreign Member, Finnish Academy of Science and Letters.
  • N. Megiddo
    Member of the Scientific Board, The Institute for Medical BioMathematics, Israel.

Miscellaneous

M. Patrascu completed his Ph.D. at MIT in Sept 2008.

M. Patrascu organized a reading group - the Berkeley Algorithmists' Meetings - at UC Berkeley.

Personnel

Research Interests
  • M. Ajtai: Complexity theory, combinatorics, and logic, and the connections between them
  • K. Clarkson: Computational geometry, design and analysis of algorithms, optimization
  • R. Fagin: Applications of logic to computer science, reasoning about knowledge, database theory, finite-model theory
  • V. Feldman: Learning theory, computational models of the brain, complexity theory
  • E. Hazan: Online learning, optimization, computational finance, game theory
  • T. S. Jayram: Complexity theory, algorithms for massive data sets, probabilistic databases
  • B. Kimelfeld: Database theory, XML, query optimization, enumeration algorithms and complexity, probabilistic databases, incomplete information, ranking techniques, information retrieval
  • P. Kolaitis: Logic in computer science, database theory, computational complexity
  • N. Megiddo: Mathematical programming, algorithms and complexity, game theory, machine learning
  • M. Patrascu: Data structures, lower bounds, exact algorithms
  • C. Seshadhri: Sublinear algorithms and property testing, approximation algorithms, computational geometry, machine learning
  • D. Woodruff: Algorithms, complexity theory, and cryptography

Transitions
  • Balder ten Cate of the University of Amsterdam visited from January 2008 to September 2008.
  • Phokion Kolaitis stepped down as Senior Manager of the Theory Group in September 2008.
  • Ken Clarkson became the Senior Manager of the Theory Group in September 2008.

Organization
  • Ken Clarkson, Senior Manager, CS Principles and Methodologies
  • Stella Ayala (Administrative Assistant)
  • Ron Fagin, Manager, Foundations of Computer Science
  • T. S. Jayram, Manager, Algorithms and Computation
  • Miki Ajtai
  • Vitaly Feldman
  • Elad Hazan
  • Benny Kimelfeld (postdoctoral scholar)
  • Phokion Kolaitis
  • Nimrod Megiddo
  • Mihai Patrascu (Raviv postdoctoral scholar)
  • C. Seshadhri (postdoctoral scholar)
  • David Woodruff

    About IBMPrivacyContact

    About IBMPrivacyContact