Activity Report, January - June 2007
- Highlights
- Awards and Honors
- Publications and Conferences
- Patents
- Invited Talks
- Presentations and Seminars
- Committee Work
- Editorships
- External Positions
- Miscellaneous
- Personnel
- All activity reports
Highlights
Quasi-inverses of schema mappings
Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the study of the inverse operator was not made until very recently, as even finding the exact semantics of this operator turned out to be a fairly delicate task. Furthermore, this notion is rather restrictive, since it is rare that a schema mapping possesses an inverse. Ron Fagin, Phokion Kolaitis, Lucian Popa, and Wang-Chiew Tan (UC Santa Cruz) introduced and studied the notion of a quasi-inverse of a schema mapping. This notion is a principled relaxation of the notion of an inverse of a schema mapping; intuitively, it is obtained from the notion of an inverse by not differentiating between instances that are equivalent for data-exchange purposes. They derived a number of properties, including a necessary and sufficient combinatorial condition for the existence of a quasi-inverse. A paper describing these results was presented at the 26th ACM Symposium on Principles of Database Systems (PODS 2007); the full version of this paper was subsequently invited to a special issue of ACM TODS.
New mechanisms for sharing the cost of public excludable goods
The only known general technique for designing truthful, approximately budget-balanced cost-sharing mechanisms is due to Moulin. However, Moulin mechanisms inevitably suffer from poor budget-balance, poor economic efficiency, or both. Aranyak Mehta, Tim Roughgarden (Stanford) and Mukund Sundararajan (Stanford) propose a new framework of acyclic mechanisms, which strictly generalize Moulin mechanisms and offer three important advantages: they are easier to design starting with off-the-shelf polynomial time primal-dual algorithms, they provide exponentially better efficiency and budget balancedness than Moulin mechanisms, and they apply to more general settings including variable demand utilities. A paper describing these results was presented at the 8th ACM Conference on Electronic Commerce (EC 07).
Online learning with prior information
The standard so-called experts algorithms are methods for utilizing a given set of "experts" to make good choices in a sequential decision-making problem. In the standard setting of experts algorithms, the decision maker chooses repeatedly in the same "state" based on information about how the different experts would have performed, if chosen to be followed. Elad Hazan and Nimrod Megiddo extended this framework by introducing state information. More precisely, the experts algorithm may rely on partial information regarding the cost function, which is revealed to him before choosing an action. This extension is very natural in prediction problems. For illustration, an experts algorithm, which is supposed to predict whether the next day will be rainy, can be extended to predicting the same given the current temperature. This work introduces new algorithms that attain optimal performance in the new framework, and apply to more general settings than variants of regression considered in the statistics literature. A paper describing this work was presented at 20th Annual Conference on Learning Theory (COLT 2007).
Adaptive online gradient descent
Peter Bartlett (UC Berkeley), Elad Hazan, and Alexander Rahklin (UC Berkeley) propose a new online learning algorithm, which is more efficient than previous algorithms for a general optimization framework called online convex optimization. The new algorithm, Adaptive Online Gradient Descent, interpolates between the previous results of Zinkevich for linear functions and those of Hazan et al. for strongly convex functions, achieving optimal performance for both extremes and intermediate cases. Finally, they provide an extension of these results to general norms.
Computational equivalence of fixed points and no regret algorithms, and convergence to equilibria
Elad Hazan and Satyen Kale (Princeton) studied the relation between notions of game-theoretic equilibria which are based on stability under a set of deviations, and empirical equilibria which are reached by rational players. Rational players are modelled by players using no regret algorithms, which guarantee that their payoff in the long run is almost as large as they could have hoped to achieve by consistently deviating from the algorithm's suggested action. They show that, for a given set of deviations over the strategy set of a player, it is possible to efficiently approximate fixed points of a given deviation if and only if there exist efficient no regret algorithms resistant to the deviations. Further, they show that if all players use a no regret algorithm, then the empirical distribution of their plays converges to an equilibrium.
Estimating statistical aggregates on probabilistic data streams
The probabilistic-stream model was introduced by Jayram et al. in 2007. It is a generalization of the data stream model that is suited to handling probabilistic data. Designing efficient aggregation algorithms for probabilistic data is crucial for handling uncertainty in data-centric applications such as OLAP, and other setting such as analyzing search engine traffic and aggregation in sensor networks. T.S. Jayram, Andrew McGregor (UCSD), S. Muthukrishan (Google Research), and Erik Vee (previously a postdoc at IBM, and currently at Yahoo! Research) present new algorithms for computing commonly used aggregates on a probabilistic stream. They present the first one-pass streaming algorithms for estimating the expected mean of a probabilistic stream, which also exponentially improves the space bounds in the previous multi-pass algorithm. Their work also considers the problem of estimating frequency moments for probabilistic streams. They propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, they extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. They also present streaming algorithms for estimating F0, the number of distinct items, and for estimating the median on probabilistic streams. A paper describing this work was presented at the 26th ACM Symposium on Principles of Database Systems (PODS 2007).
Lower bounds for randomized read/write stream algorithms
Motivated by the capabilities of modern storage architectures, Paul Beame (U. of Washington), T.S. Jayram, and Atri Rudra (U. of Washington) consider a generalization of the data-stream model, where the algorithm has sequential access to multiple streams and can also write onto streams. There is no limit on the size of the streams, but the number of passes made on the streams is restricted. On the other hand, the amount of internal memory used by the algorithm is scarce, similar to the data stream model. They resolve the main open problem in previous work of proving lower bounds in this model for algorithms that are allowed to have 2-sided error. Previously, such lower bounds were shown only for deterministic and 1-sided error randomized algorithms. For the classical set disjointness problem---an invaluable reduction candidate for deriving lower bounds for many other problems involving data streams and other randomized models of computation---they show a near-linear lower bound on the size of the internal memory used by a randomized algorithm with 2-sided error that is allowed to have o(log N/ log log N) passes over the streams. This bound is almost optimal since there is a simple algorithm that can solve this problem using logarithmic memory if the number of passes over the streams is allowed to be O(log N). Applications include near-linear lower bounds on the internal memory for well-known problems in the literature: (1) approximately counting the number of distinct elements in the input (F0); (2) approximating the frequency of the mode of an input sequence (3) computing the join of two relations; and (4) deciding if some node of an XML document matches an XQuery (or XPath) query. Their techniques involve a novel direct-sum type of argument that yields lower bounds for many other problems. Their results asymptotically improve previously known bounds for any problem even in deterministic and 1-sided error models of computation. A paper describing these results was presented at the 39th ACM Symposium on Theory of Computing (STOC 2007).
Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm
Ken Clarkson has previously shown a relationship between coresets, a concept arising in geometric algorithms, and sparse greedy approximation, arising in machine learning. These notions are dual, in the technical sense of convex optimization duality, and can be understood as instances of the Frank-Wolfe algorithm, a venerable technique for convex optimization. In this work, he showed that coresets exist for general optimization problems, with special cases including the machine learning techniques of boosting, and of support vector machines. These results in turn allow some sample-complexity results for these techniques to be sharpened or simplified, and the practical "core vector machine" algorithm to be simplified and generalized.
The computational hardness of estimating edit distance
The edit distance between two strings is defined as the number of insertions/deletions/substitutions needed to transform one string into the other. This distance plays a key role in several fields like computational biology and text processing. Edit distance appears to be notoriously difficult to deal with algorithmically (both theoretically and in practice), but no nontrivial computational lower bounds for it were previously known. Alex Andoni (MIT) and Robert Krauthgamer proved strong lower bounds on the communication complexity of estimating (approximating) the edit distance. These new results provide the first separation between Hamming distance and edit distance in a computational setting. This proof easily extends to special cases like permutations, and also provides new non-embeddability bounds that extend recent results in this area. These results were accepted for presentation at the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007).
Earth mover distance over high-dimensional spaces
The Earth Mover Distance (EMD) between two equal-size sets of points in R^d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by recent developments in that area, Alex Andoni (MIT), Piotr Indyk (MIT), and Robert Krauthgamer, addressed computational problems involving EMD over high-dimensional pointsets. A natural approach is to embed the EMD metric into l_1, and use the algorithms designed for the latter space. Focusing on sets with cardinalities upper-bounded by a parameter s, they achieve a distortion of only O(log s log d). Since in applications the feature sets have bounded size, this approach circumvents a recent Omega(d) lower bound of Khot and Naor that applies when the cardinalities are unrestricted. They also provide a strong lower bound on the multi-round communication complexity of estimating EMD, which in particular strengthens the aforementioned non-embeddability result.
On triangulation of simple networks
Network triangulation is a method for estimating distances between nodes in the network, by letting every node measure its distance to a few beacon nodes, and deducing the distance between every two nodes x,y by using their measurements to their common beacons and applying the triangle inequality. Kleinberg, Slivkins and Wexler [FOCS 2004] initiated a theoretical study of triangulation in metric spaces, and Slivkins [PODC 2005] subsequently showed that metrics of bounded doubling dimension admit a triangulation that approximates arbitrarily well all pairwise distances using only O(log n) beacons per point, where n is the number of points in the network. He then asked whether this term is necessary (for doubling metrics). Robert Krauthgamer provided the first lower bounds on the number of beacons required for a triangulation in some specific simple networks. In particular, these bounds (i) answer Slivkins' question positively, even for one-dimensional metrics, and (ii) prove that, up to constants, Slivkins' triangulation achieves an optimal number of beacons (as a function of the approximation guarantee and the doubling dimension). These results were presented at the 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007).
Distance estimation revisited: Drawing a computational view of embeddings
The Distance Estimation problem in a metric space M asks, for given two points in M, to estimate their distance, in a restricted model of computation such as the communication complexity setting. This problem emerges as an important primitive to many other metric problem (such as the Nearest Neighbor Search), and thus requires a more comprehensive understanding. Alex Andoni (MIT) and Robert Krauthgamer initiated a systematic study of the distance estimation problem in the communication complexity setting. This setting appears to be inherently a closer paradigm to computational problems than the approach of embedding into normed spaces. Since the latter is one of the prevailing approaches to many metric problems, a natural and important goal is to quantify how much more powerful is the distinction between these two paradigms. Their results include upper bounds for basic metrics that go beyond what is possible through the embeddings approach, and they also reprove in this new model some known lower bounds for hard metrics. Finally, they prove a relation that bounds how much distance estimation can go beyond embeddings into normed spaces.
Metric clustering via consistent labeling
The extensive body of work on a fundamental optimization problems in metric spaces, such as computing separating and padded decompositions, sparse covers, and metric triangulations, has sought absolute bounds that hold for every possible metric space (or for a family of metrics). Initiating the study of relative guarantees, which compare the produced solution to the optimal one for the input at hand, Robert Krauthgamer and Tim Roughgarden (Stanford) designed the first approximation algorithms for these problems. Their technical approach is to cast a number of metric clustering problems that have been well studied---but almost always as disparate problems---into a common modeling and algorithmic framework, which they call the Consistent Labeling problem. Having identified the common features of all of these problems, they then provide a family of linear programming relaxations and simple randomized rounding procedures that achieve provably good approximation guarantees.
Awards and Honors
R. Fagin's paper "Functional dependencies in a relational database and propositional logic" was selected as a significant paper published in the first 50 years of the IBM journals.
Ron Fagin received an IBM supplemental Patent Issue Award, given for key IBM patents, for an encryption keys patent.
Ron Fagin was named a Fellow of the AAAS (American Association for the Advancement of Science) for "fundamental contributions to computational complexity theory, database theory, and the theory of multi-agent systems".
Phokion Kolaitis was elected a Foreign Member of the Finnish Academy of Science and Letters in April 2007.
Publications and Conferences
M. Ajtai. Generalizations of the Compactness Theorem and Godel's Completeness Theorem for Nonstandard Finite Structures. In Proc. Theory and Applications of Models of Computation (TAMC'07), pp. 1-22, May, 2007.
Delbert D. Bailey, Victor Dalmau, Phokion G. Kolaitis, Phase transitions of PP-complete satisfiability problems, Discrete Applied Mathematics, 155 (12), 2007, 1627-1639.
P. Beame, T. S. Jayram, and A. Rudra. Lower bounds for randomized read/write stream algorithms. In Proc. 39th ACM Symposium on Theory of Computing (STOC), pp. 689-698, 2007.
D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. OLAP over uncertain and imprecise data. VLDB J. 16(1):123-144, 2007.
C. Daskalakis, A. Mehta, and C. Papadimitriou. Progress in Approximate Nash equilibrium. In Proc. 8th ACM conference on Electronic Commerce (ACM EC'07).
A. Deutsch, B. Ludascher, and A. Nash. Rewriting queries using views with access patterns under integrity constraints. Theoretical Computer Science, Volume 371, Issue 3, March 2007.
R. Fagin, Ph. Kolaitis, L. Popa, and W-C.Tan, Quasi-inverses of schema mappings, Proc. 26th ACM Symposium on Principles of Database Systems (PODS), pp. 123-132, 2007.
E. Halperin, G. Kortsarz, R. Krauthgamer, A. Srinivasan, and N. Wang. Integrality Ratio for Group Steiner Trees and Directed Steiner Trees. SIAM J. Comput. 36(5):1494-1511, 2007.
E. Hazan and N. Megiddo. Online Learning with prior information. In Proc. 20th Annual Conference on Learning Theory (COLT 2007).
T. S. Jayram, A. McGregor, S. Muthukrishnan, and E. Vee. Estimating statistical aggregates on probabilistic data streams. In Proc. 26th ACM Symposium on Principles of Database Systems (PODS), pp. 243-252, 2007.
S. Kapoor, A. Mehta, and V. Vazirani. An Auction-Based Market Equilibrium Algorithm for a Production Model. Theoretical Computer Science, 378:2, pp. 153-164. Special Issue on Internet and Network Economics.
R. Krauthgamer: On triangulation of simple networks. In Proc. 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 8-15, 2007.
R. Legrand, E. Markakis, and A. Mehta. Approval Voting: Approximation Algorithms and Heuristics for the Minimax Solution. In Proc. 6th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'07).
A. Mehta, T. Roughgarden, and M. Sundararajan. Beyond Moulin Mechanisms. In Proc. 8th ACM conference on Electronic Commerce (ACM EC'07).
A. Nash, P. A. Bernstein, and S. Melnik. Composition of mappings given by embedded dependencies. ACM Transactions on Database Systems (TODS) Volume 32, Issue 1, March 2007.
A. Nash and A. Deutsch. Privacy in GLAV information integration. In Proc. ICDT 2007.
A. Nash, L. Segoufin, and V. Vianu. Determinacy and rewriting of conjunctive queries using views: a progress report. In Proc. ICDT 2007.
Phokion G. Kolaitis, The Expressive Power of Logics on Finite Models in Finite Model Theory and its Applications, EATCS Series: Texts in Theoretical Computer Science, Springer, 2007, pp. 27--124.
Phokion G. Kolaitis and Moshe Y. Vardi,
A Logical Approach to Constraint Satisfaction in
Finite Model Theory and its Applications,
EATCS Series: Texts in Theoretical Computer Science, Springer, 2007, pp. 339--371.
Patents
R. Fagin, L. Haas, M.A. Hernandez, R.E. Miller, F.G. Naumann, L. Popa. Illustrating schema mappings. Issued as Patent 7149746 in US, 12/12/06.
N. Megiddo, A. Tomkins and S. Vaithyanathan. Method and system for searching and retrieving documents, Issued as patent 7,209,913 in US, 4/24/2007.
N. Megiddo and D. Modha. System and method for adaptively managing pages in a memory. Issued as patent 7,167,953 in US, 1/23/2007.
Invited Talks
- M. Ajtai
-
Generalizations of the Compactness Theorem and Godel's Completeness Theorem for
Nonstandard Finite Structures.
Plenary speaker, Theory and Applications of Models of
Computation 2007 (TAMC'07), Shanghai, China, May, 2007
- K. Clarkson
-
Sparse Greedy Approximation, and the Frank-Wolfe Algorithm.
Dagstuhl seminar on Computational Geometry, March 2007.
- R. Fagin
-
Finite model theory -- how it all began.
Milner Lecture, University of Edinburgh, June 2007.
- R. Krauthgamer
-
Lower bounds for edit distance estimation.
Workshop on Geometry and Algorithms, Heriot-Watt University, Edinburgh, UK, April 2007.
- N. Megiddo
-
Game Theory and Complexity.
The Hebrew University of Jerusalem, May 2007.
Presentations and Seminars
- K. Clarkson
-
"Approximating Surfaces with Meshes"
Graphics Lunch at UC Berkeley, April 2007.
- R. Fagin
-
"Operators on schema mappings"
University of Edinburgh, UK, June 2007.
- E. Hazan
-
"New algorithms for online learning and portfolio management"
UC Berkeley Theory Seminar, February 2007; Stanford Algorithms Seminar, May 2007; The Hebrew University Theory Seminar, May 2007.
- R. Krauthgamer
-
"On embedding edit distance into L_1."
University of Washington, February 2007.
- A. Nash
-
"Determinacy and rewriting of conjunctive queries using views."
University of Pennsylvania, February 2007.
Committee Work
- R. Fagin
-
Program committee member, Scalable Uncertainty Management, May 2007
- T. S. Jayram
-
Program committee member, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007)
- R. Krauthgamer
-
Program committee member, 39th ACM Symposium on Theory of Computing (STOC 2007)
Program committee member, 10th International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX 2007)
- A. Nash
-
Program committee member, 26th ACM Symposium on Principles of Database Systems (PODS 2007)
Editorships
- K. Clarkson
Associate Editor, Computational Geometry: Theory and Applications
Associate Editor, Operations Research Letters
- R. Fagin
Editor, Chicago Journal of Theoretical Computer Science
Associate Editor, Journal of Computer and System Sciences
Editorial Board Member, Foundations and Trends in Databases
- P. Kolaitis
Managing Editor, Annals of Pure and Applied Logic
Editor, Bulletin of Symbolic Logic
Associate Editor, ACM Transactions on Database Systems
Editorial Board Member, Journal of Logic and Computation
Editorial Board Member, Theory of Computing Systems
Editorial Board Member, International Journal of Foundations of Computer Science
Advisory Board Member, Logical Methods in Computer Science
- R. Krauthgamer Associate Editor, Theory of Computing
- N. Megiddo
Editor-in-Chief, Mathematics of Operations Research
Area Editor, Journal of the ACM
Area Editor, Operations Research Letters
Editorial Board Member, Algorithmica
Editorial Board Member, Discrete Applied Mathematics
Editorial Board Member, Discrete Optimization
Editorial Board Member, Games and Economic Behavior
Editorial Board Member, Journal of Combinatorial Optimization
Associate Editor, Mathematical Programming
External Positions
- M. Ajtai Member of the Hungarian Academy of Sciences.
- P. Kolaitis Foreign Member, Finnish Academy of Science and Letters.
- N. Megiddo Member of the Scientific Board, The Institute for Medical BioMathematics, Israel.
Miscellaneous
R. Fagin and R. Krauthgamer attended the IBM Watson Theory Fest workshop, July 23-28, 2006.
R. Krauthgamer served as the local organizer for the 2006 Bay Area Theory Symposium that was held in IBM Almaden on December 1, 2006.
N. Megiddo served as a guest editor for a special issue of Theoretical Computer Science.
Personnel
- Foto Afrati (February 1-15, 2007)
- Amit Charkrabarti (June 1 - 29, 2007)
- Marcelo Arenas (June 4-8, 2007)
Transitions (in chronological order)
- Ken Clarkson, joined the department as a Research Staff Member in January 2007.
- Christina Howell left the department in March 2007.
- Desra Halka joined the department as an administrative assistant April 2007.
- Phokion Kolaitis, Senior Manager, CS Principles and Methodologies
- Desra Halka (Administrative Assistant)
- Ron Fagin, Manager, Foundations of Computer Science
- Miki Ajtai
- Ken Clarkson
- Elad Hazan
- T. S. Jayram
- Robert Krauthgamer
- Nimrod Megiddo
- Elitza Maneva
- Aranyak Mehta
- Alan Nash

