
Highlights
Comparing and aggregating rankings with ties
Rank aggregation is a useful abstraction that has several applications, including meta-search, synthesizing rank functions from multiple indices, similarity search, and classification. In database applications (catalog searches, fielded searches, parametric searches, etc.), the rankings are produced by sorting an underlying database according to various fields. Typically, there are a number of fields that each have very few distinct values, and hence the corresponding rankings have many ties in them. Known methods for rank aggregation are poorly suited to this context, and the difficulties can be traced back to the fact that we do not have sound mathematical principles to compare two "partial rankings", that is, rankings that allow ties. Ron Fagin, Ravi Kumar, and D. Sivakumar, together with 2003 summer interns Mohammad Mahdian (MIT) and Erik Vee (U. Washington), have provided a comprehensive picture of how to compare partial rankings. They proposed several metrics to compare partial rankings, presented algorithms that efficiently compute them, and proved that they are within constant multiples of each other. Based on these concepts, they formulated aggregation problems for partial rankings, and developed a highly efficient algorithm to compute the top few elements of a near-optimal aggregation of multiple partial rankings. This paper appeared in the 2004 ACM Symposium on Principles of Database Systems (PODS 2004).
Composing Schema Mappings: Second-Order Dependencies to the Rescue
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data. A schema mapping is a specification that describes how data structured under the source schema is to be transformed into data structured under the target schema. Schema mappings play a key role in numerous areas of database systems, where data is converted from one format to another. Ron Fagin, Phokion Kolaitis, Lucian Popa, and Wang-Chiew Tan (UC Santa Cruz) have given a rigorous semantics to the composition of schema mappings. They investigated the definability and computational complexity of the composition of two schema mappings, and obtained some surprises along the way. This paper appeared in the 2004 ACM Symposium on Principles of Database Systems (PODS 2004).
Locally Consistent Transformations and Query Answering in Data Exchange
Given a source instance in a data exchange scenario as in "Composing Schema Mappings: Second-Order Dependencies to the Rescue" above, there may be many solutions - target instances that satisfy the constraints of the data exchange problem. The "certain answers" to a query over the target schema are those tuples that appear in every possible solution. Ron Fagin, along with Marcelo Arenas, Pablo Barcelo, and Leonid Libkin of the University of Toronto, studied the question of when a single solution, that is, a single target database, can be used to find the certain answers. This paper appeared in the 2004 ACM Symposium on Principles of Database Systems (PODS 2004).
On the Memory Requirements of XPath Evaluation over XML Streams
The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past two years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms. Ziv Bar-Yossef, in a joint work with M. Fontoura and V. Josifovski (IBM Almaden), initiated the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. They presented a general lower bound technique, whose proof uses tools from communication complexity, and exploited insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum, and deviates from the standard paradigm of using automata or transducers. A paper about this work appeared in the 23rd ACM Symposium on Principles of Database System (PODS 2004).
The Sketching Complexity of Pattern Matching
Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer and Ravi Kumar addressed the problems of pattern matching and approximate pattern matching in the sketching model. They showed it is impossible to compress the text into a small sketch and use only the sketch to decide whether a given pattern occurs in the text. They also proved a sketch size lower bound for approximate pattern matching, and showed it is tight up to a logarithmic factor. A paper about this work was accepted to the 8th International Workshop on Randomization and Computation (RANDOM 2004).
The black-box complexity of nearest neighbor search
A natural notion of efficiency for approximate nearest-neighbor (ANN) search in general n-point metric spaces, is the existence of a randomized algorithm which answers (1+epsilon)-approximate nearest neighbor queries in polylog(n) time using only polynomial space. Robert Krauthgamer, in a joint work with James R. Lee (UC Berkeley) suggest this definition and then study which families of metric spaces admit efficient ANN schemes in the black-box model, where only oracle access to the distance function is given, and any query consistent with the triangle inequality may be asked. For epsilon < 2/5, they offer a complete answer to this problem, using the notion of metric dimension defined by [Gupta, Krauthgamer and Lee, 2003] (a la [Assouad, 1983]), namely, efficient ANN is possible whenever the doubling dimension is at most O(log log n). For coarser approximations (larger epsilon), the same algorithm clearly works if the dimension is small, but there is a threshold at which a small dimension is not required, since points in the ``ambient space'' may begin to affect the complexity of ``hard'' subspaces of the metric. A paper about this work appeared in the 31st International Colloquium on Automata, Languages and Programming (ICALP 2004).
Measured descent: A new embedding method for finite metrics
Robert Krauthgamer, in a joint work with James R. Lee (UC Berkeley), Manor Mendel (UIUC) and Assaf Naor (Microsoft Research), devised a new embedding technique, which called measured descent, which is based on decomposing a metric space at a varying speed, according to the local density. This provides a refined and unified framework for the two primary methods of constructing Frechet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. They prove that any n-point metric space embeds in Hilbert space with distortion that has mild dependence on the decomposability of the metric. This result recovers Bourgain's O(log n) bound, but when the metric is, in a sense, ``low-dimensional,'' improved bounds are achieved. This embedding provides the best-possible volume-respecting (for subsets of arbitrary size), which answers positively a question posed in [Feige, 1998]. Similar techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted n-point planar graph embeds in O(log n) dimensional normed space with constant. This bound on the dimension is optimal. A paper about this work was accepted to the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2004).
Awards and Honors
R. Fagin, ACM SIGMOD Edgar F. Codd Innovation Award, June 2004.
M. Ajtai, IBM Outstanding Innovation Award, June 2004.
S. Rajagopalan, 5th Plateau Patent Achievement.
Publications and Conferences
M. Ajtai, A conjecture about polynomial time computable lattice-lattice functions. Proc. 36th ACM Symposium on Theory of Computing (STOC), 2004, 486-493.
A. Archer, J. Fakcharoenphol, C. Harrelson, R. Krauthgamer, K. Talwar, and E. Tardos. Approximate classification via earthmover metrics. Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1072-1080, January 2004.
M. Arenas, P. Barcelo, R. Fagin, and L. Libkin, Locally consistent transformations and query answering in data exchange. Proc. 23rd ACM Symposium on Principles of Database System (PODS), Paris, 2004, pp. 229-240.
Z. Bar-Yossef, A. Broder, R. Kumar, and A. Tomkins. Sic transit gloria talae: Towards an understanding of the web's decay. Proc. 13th International World-Wide Web Conference, pages 328-337, May 2004.
Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the Memory Requirements of Evaluating XPath Queries over XML Streams. Proc. 23rd ACM Symposium on Principles of Database Systems (PODS), Paris, 2004.
Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data streams and communication complexity. Journal of Computer and System Sciences, 68(4):702-732, 2004.
T. Batu, R. Kumar, and R. Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. Proc. 36th ACM Symposium on Theory of Computing (STOC), pages 381-390, June 2004.
M. Charikar, J. Kleinberg, R. Kumar, S. Rajagopalan, A. Sahai, and A. Tomkins. Minimizing wirelength in zero and bounded skew clock trees. SIAM Journal on Discrete Mathematics, 17(4):582-595, 2004.
D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. Proc. 15th SIAM Symposium on Discrete Algorithms (SODA), pages 151-156, January 2004.
F. Ergun, R. Kumar, and R. Rubinfeld. Fast approximate probabilistically checkable proofs. Information and Computation, 189(2): 135-159, 2004.
R. Fagin, P. Kolaitis, L. Popa, and W-C. Tan, Composing schema mappings: Second-order dependencies to the rescue. Proc. 23rd ACM Symposium on Principles of Database Systems (PODS), Paris, 2004, pp. 83-94.
R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee, Comparing and aggregating rankings with ties. Proc. 23rd ACM Symposium on Principles of Database Systems (PODS), Paris, 2004, pp. 47-58.
C. Faloutsos, K. S. McCurley and A. Tomkins. Connection Subgraphs in Social Networks. In Workshop on Link Analysis, Counterterrorism, and Privacy. SIAM International Conference on Data Mining, 2004.
D. Gruhl, R. Guha, D. Liben-Nowell and A. Tomkins. Information Diffusion Through Blogspace. Proc. 13th International World Wide Web Conference, New York, New York, 2004.
R. Guha, R. Kumar, P. Raghavan and A. Tomkins. Propagation of trust and distrust. Proc. 13th International World-Wide Web Conference, pages 403-412, May 2004.
K. Hildrum, R. Krauthgamer, and J. Kubiatowicz. Object location in realistic networks. Proc. 16th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 25-35, June 2004.
D. R. Karger, M. Ruhl. Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems. Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Barcelona, June 2004.
G. Kortsarz, R. Krauthgamer, and J. R. Lee. Hardness of approximation for vertex-connectivity network design problems. SIAM J. Comput., 33(3):704-720, 2004.
R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 791-801, January 2004.
R. Krauthgamer, N. Linial, and A. Magen. Metric embeddings-beyond one-dimensional distortion. Discrete Comput. Geom., 31(3):339-356, 2004.
A. Menon and A. Tomkins. Learning About the Market's Periphery: IBM's WebFountain. Long Range Planning, 37(2), pp. 153-162. April 2004.
A. Newman, M. Ruhl. Combinatorial Problems on Strings with Applications to Protein Folding, Proc. Latin American Theoretical Informatics (LATIN), Buenos Aires, pp. 369-378, April 2004.
J. Novak, P. Raghavan and A. Tomkins. Anti-Aliasing on the Web. Proc. 13th International World Wide Web Conference, New York, New York, 2004.
Patents
R. Fagin, N. Megiddo, R. J. T. Morris, S. Rajagopalan, H.J. Rosen and T. G. Zimmerman, Digital pen using speckle tracking, Issued as Patent 6686579 in US, 2/3/2004.
J. J. Forrest, N. Megiddo and J.A. Tomlin, Method for solving a large sparse triangular system of linear equations, Issued as Patent 6694343 in US, 2/17/2004.
N. Megiddo, P. Raghavan and T.F. Syeda-Mahmood, Method and apparatus for representing database and query information using the interval hash tree, Issued as Patent 6757686 in US, 6/29/2004.
N. Megiddo, System for securing electronic mail, Issued as Patent 6745231 in US, 6/1/2004.
N. Megiddo and B. Mendelson, Restructuring of executable computer code and large data sets, Issued as Patent 6742179 in US, 5/25/2004.
N. Megiddo, System and method for maintaining multiple identities and reputations for internet interactions, Issued as Patent 6725269 in US, 4/20/2004.
Invited Talks
- R. Fagin, R. Kumar and D. Sivakumar
- "Open problems inspired by the work of Miki Ajtai",
- Bay Area Theory Symposium (BATS), Microsoft Research Silicon Valley, Mountain View, CA, January 2004.
- R. Krauthgamer
- "Heuristics for the Hidden Clique Problem",
- Bertinoro Workshop on Combinatorial Optimization, Bertinoro, Italy, May 2004.
- A. Tomkins
- "Information Diffusion through Blogspace",
- Presentation and panel, Yahoo! Spot Workshop on
Social Networks, Sunnyvale, CA, June 2004.
- "Online Conversations: Aliases, Topics, and Propagation Networks",
- CMU Aladdin Workshop on Web Structure and Algorithms, Carnegie Mellon University, Pittsburgh, PA, 2004.
Presentations and Seminars
- R. Guha
- "Towards a Web of Data",
- IIT Bombay, India, May 2004.
- "Semantic Web",
- IISc Bangalore, India, June 2004.
- R. Krauthgamer
- "Navigating nets: Simple algorithms for proximity search",
- Princeton University, Princeton, NJ, January 2004, and The Hebrew University, Jerusalem, Israel, June 2004, and Tel-Aviv University, Tel-Aviv, Israel, June 2004.
- R. Kumar
- "On separating nondeterminism and randomization in communication complexity",
- Combinatorics and complexity theory seminars, Institute for Advanced Study, February 2004, and Theory of Computation Seminar, Radcliffe Institute, May 2004.
- "A CS perspective on voting",
- Theory seminar, Princeton, April 2004 and DIMACS Workshop on web search and metasearch, Bertinoro, Italy, June 2004.
- "Internet Search and Meta-Search",
- DIMACS Tutorial on Social Choice and Computer Science, May 2004.
Committee Work
- R. Fagin
- Member, Program committee, Search track of WWW 2004.
- R. Guha
- Member, Program Committee, International Semantic Web Conference, Japan, November 2004.
- Co-organizer, Meaning Negotiation Workshop, Japan, November 2004.
- Member, Program committee, Search track of WWW 2004.
- Member, Program committee, Semantic Web track of WWW 2004.
- R. Krauthgamer
- Member, Program committee, STOC 2004.
- R. Kumar
- Member, Program committee, Search track of WWW 2004.
- Member, Program Committee, WebKDD 2004.
- Member, Executive Committee, Workshop on Algorithms and Models for the Web Graph (WAW 2004).
- K. McCurley
- Member, Workshop on Algorithms and Models for the Web Graph (WAW 2004).
- N. Megiddo
- Member, Program Committee, ICML 2004.
- Cluster chair, INFORMS national fall meeting, 2004.
- Member, INFORMS Frederick W. Lanchester Prize committee.
- S. Rajagopalan
- Member, Workshop on Algorithms and Models for the Web Graph (WAW 2004).
- Member, Program Committee, WWW 2004.
- Member, Program Commitee, ICDM 2004.
- D. Sivakumar
- Member, Program committee, STOC 2004.
- Member, Workshop on Algorithms and Models for the Web Graph (WAW 2004).
- Member, Program committee, FOCS 2004.
- A. Tomkins
- Member, Program Committee, IEEE/WIC International Conference on Web Intelligence, 2004.
- Member, Program committee, Search track of WWW 2004.
- Member, Program Committee, WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics.
- Member, Program Committee, International Workshop on the Web and Databases, WebDB 2004.
Editorships
- R. Fagin
Editor, Chicago Journal of Theoretical Computer Science
Associate Editor, Journal of
Computer and System Sciences
- R. Krauthgamer
Associate Editor, Theory of Computing
- N. Megiddo
Editor-in-Chief, Mathematics of Operations Research
Area editor, Journal of the ACM
Area editor, Operations Research Letters
Editorial Board Member, Algorithmica
Editorial Board Member, Discrete Applied Mathematics
Editorial Board Member, Discrete Optimization
Editorial Board Member, Games and Economic Behavior
Editorial Board Member, Journal of Combinatorial Optimization
Associate Editor, Mathematical Programming
- D. Sivakumar
Managing Editor, Theory of Computing
- A. Tomkins
Editorial Board Member, Web Intelligence and Agent Systems.
External Positions
Miscellaneous
- S. Rajagopalan taught CS 294, seminar on "Algorithms for large datasets", at UC Berkeley, Berkeley, CA, Spring 2004.
Personnel
Transitions
- Ziv Bar-Yossef took a leave of absence in June 2004 to join the Electrical Engineering Department at the Technion.
-
David Gibson is on leave to work with the WebFountain function at ARC.
-
Phokion Kolaitis, a Professor of Computer Science at the University of California, Santa Cruz, joined as the department's senior manager in June 2004.
-
Sridhar Rajagopalan is on leave to work in the Content Management Department at ARC.
-
Matthias Ruhl completed his postdoc in June 2004 and will join Google Research.
Organization
Phokion Kolaitis (Senior Manager) CS Principles and Methodologies
Cheryl Rothrock (Administrative Assistant)
Sridhar Rajagopalan (On Leave at the Content Management Department)
Larry Stockmeyer (Emeritus)
Andrew Tomkins (Manager) Information Management Principles
Ziv Bar-Yossef (On Leave at the Technion)
David Gibson (On Leave at the WebFountain function)
Ramanathan Guha
Ravi Kumar
Kevin McCurley
Ron Fagin (Manager) Foundations of Computer Science
Miki Ajtai
T. S. Jayram
Robert Krauthgamer
Nimrod Megiddo
D. Sivakumar
|