Skip to main content

Activity Report

Activity Report, July - December 2003


Highlights

How to Combine Expert (or Novice) Advice when Actions Impact the Environment
There is a known methodology in the field of machine learning which is called "experts algorithms". It applies to situations where a decision maker is required to take actions repeatedly and receives rewards that depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies ("experts"), each of which may recommend an action. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated non-zero-sum game. Daniela de Farias (MIT) and Nimrod Megiddo developed a new experts algorithm and analyzed it in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player's actions on the opponent's actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner's Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play. A paper about this algorithm was among 26 papers chosen out of 717 submissions for full oral presentation at the 2003 NIPS conference.

Networks of Trust (and Distrust)
A network of people connected by directed ratings or trust scores, and a model for propagating those trust scores, is a fundamental building block in many of today's most successful e-commerce and recommendation systems. In eBay, such a model of trust has significant influence on the price an item may command. In Epinions, conclusions drawn from the web of trust are linked to many behaviors of the system, including decisions on items to which each user is exposed. R. Guha, Ravi Kumar, and Andrew Tomkins, in joint work with Prabhakar Raghavan (Verity, Inc.), developed a framework of trust propagation schemes, each of which may be appropriate in certain circumstances, and evaluated the schemes on a large trust network consisting of 800K trust scores expressed among 130K people. They showed that a small number of expressed trusts/distrust per individual allows one to predict reliably trust between any two people in the system with high accuracy: a quadratic increase in actionable information. The work appears to be the first to incorporate distrust in a computational trust propagation setting. A paper about the work was accepted to the 13th International World-Wide Web Conference (WWW2004).

Automated Detection of Aliases
To provide online privacy to individuals, one must be able to provide the ability to maintain online pseudonyms that cannot be linked together. Much work has gone into providing unlinkability at the network level. But in order to be useful, pseudonyms must engage in information-rich interactions with the rest of the world. Andrew Tomkins, jointly with Prabhakar Raghavan (Verity, Inc.) and Jasmine Novak (WebFountain), showed that for one important class of interactions, namely posting on online bulletin boards, network-level security alone is inadequate to guarantee privacy. They considered the problem of reverse engineering the multiple aliases belonging to an individual based on text posted by those aliases in public fora such as bulletin boards, netnews, weblogs, or web pages. Their results show that a careful combination of similarity detection and a tailored clustering algorithm can be disturbingly effective at teasing out aliases from large collections of posters. With only a small amount of text (25 posts, about 3000 words on average), the authors showed in one benchmark that one can pair an identity with the single correct pseudonym out of about 200 candidates with accuracy over 90%. A paper about the work was accepted to the 13th International World-Wide Web Conference (WWW2004).

Exponential Separation of Quantum and Classical One-way Communication Complexity
Relating the power of quantum to classical computation in various restricted and unrestricted models is a fundamental problem in complexity theory. For the model of one-way communication complexity, Ziv Bar-Yossef and T.S. Jayram, together with Iordanis Kerenidis (UC Berkeley), have exhibited a problem for which there is a quantum protocol that is exponentially more powerful than any bounded-error randomized protocol. No such asymptotic gap for this model was previously known. Specifically, they define the hidden matching problem: Alice gets as input a string x in {0,1}n and Bob gets a perfect matching M on the n coordinates. Bob's goal is to output a tuple such that: (1) the edge (i,j) belongs to M; and (2) b equals the XOR of xi and xj. It is shown that the quantum one-way communication complexity of this problem is O(log n), yet any randomized one-way protocol with bounded error must use Omega(n1/2) bits of communication. A paper about this work was accepted to the 36th Symposium on Theory of Computing (STOC 2004).

Navigating Nets: Simple Algorithms for Proximity Search
Robert Krauthgamer and James R. Lee (UC Berkeley) devised a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates (insertions and deletions to S). The data structure consists of a sequence of progressively finer epsilon-nets of S, with pointers that allow easy navigation from one scale to the next. The worst-case complexity of this data structure is analyzed in terms of the "abstract dimensionality" of the metric S. The data structure is shown to be extremely efficient for metrics of bounded dimension, and is essentially optimal for general metrics. Finally, as a special case, this approach improves over one recently devised by Karger and Ruhl [STOC, 2002]. A paper about this work appeared in the 15th Symposium on Discrete Algorithms (SODA 2004).

Results on Peer-to-Peer Networks
Peer-to-peer (P2P) networks are collections of networked nodes that collaborate in distributed applications, without any centralized control. Matthias Ruhl and David Karger (MIT) have developed (near-)optimal protocols for a variety of problems for P2P networks. Examples are new load balancing protocols for file sharing applications, the support of complex queries such as range searches for distributed data storage, and the efficient execution of applications that run only on a (dynamic) subset of the nodes in the P2P network. Two papers covering the results are going to appear in the 3rd International Workshop on Peer-to-Peer Systems (IPTPS 2004).

Comparison between Models for Massive Data Set Computations
Many models have been proposed to address computation on massive data sets. Examples are the streaming model, external memory algorithms, and several variations on these models. A common theme among these models is the recognition that sequential access to data is faster than random access on modern computers. However, despite this common feature, the relationships between many of these models are neither obvious nor well understood. Matthias Ruhl and Sridhar Rajagopalan, with Gagan Aggarwal (Stanford) and Mayur Datar (Google), have performed the first systematic comparison of such models, showing that previously proposed models are part of a natural hierarchy.

Object Location in More Realistic Networks
A central problem in the design of structured peer-to-peer applications is object location (a.k.a. lookup), or finding an item in a network of nodes. Robert Krauthgamer, in joint work with Kris Hildrum and John Kubiatowicz (both from UC Berkeley), devised an object location scheme (a.k.a. Distributed Hash Table), whose underlying overlay network adapts well to each node's local view of the underlying network topology. This system achieves a guaranteed low stretch in a wider and more realistic class of networks than previous systems. The system's neighbor selection policy fits the transit-stub network model and it is shown that its node congestion is low. Furthermore, the system is inherently adaptive to the network and robust to errors in network measurements, and as a result it simplifies the design choices of system builders.

Awards and Honors
M. Ajtai won an IBM Research 2002 Best Paper Award in Computer Science, Electrical Engineering and Mathematics for the paper "Determinism versus Non-determinism for Linear Time RAMs with Memory Restrictions", August 2003.

M. Ajtai, R. Fagin and L. Stockmeyer won an IBM Supplemental Patent Issue Award, given for key IBM patents, for a differential backup patent.

R. Fagin won a Bravo! Award for Exceptional Contributions to Invention Evaluation.

N. Megiddo was appointed by INFORMS to be the editor-in-chief of Mathematics of Operations Research in 2004-2006.

Publications and Conferences
N. Eiron and K.S. McCurley, Analysis of Anchor Text for Web Search. Proc. 26th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2003, pp. 459-460.

N. Eiron and K.S. McCurley, Untangaling Compound Documents on the Web. Proc. 14th ACM Conference on Hypertext and Hypermedia, August 2003, pp. 85-94.

R. Fagin, R. Kumar, and D. Sivakumar, Comparing Top k Lists. SIAM Journal on Discrete Mathematics, 17(1):134--160, 2003.

D. de Farias and N. Megiddo, How to Combine Expert (or Novice) Advice when Actions Impact the Environment? Proc. 17th Annual Conference on Neural Information Processing Systems (NIPS), December 2003.

D. de Farias and N. Megiddo, Learning and Teaching in Repeated Games: A Machine Learning Approach to Long-term Best-response Play. Proc. 14th International Conference on Game Theory, July 2003.

A. Gupta, R. Krauthgamer, and J.R. Lee, Bounded Geometries, Fractals, and Low-distortion Embeddings. Proc. 44th IEEE Symposium on Foundations of Computer Science (FOCS), October 2003, pp. 534-543.

D. R. Karger and M. Ruhl, New Algorithms for Load Balancing in Peer-to-Peer Systems. IRIS Student Workshop (ISW), August 2003.

R. Kumar and D Sivakumar, On Polynomial Approximations to the Shortest Lattice Vector Length. SIAM Journal on Discrete Mathematics, 16(3):422--425, 2003.

N. Megiddo and D. S. Modha, One Up on LRU. ;login:, Vol. 28, No. 4, August 2003, pp. 7-11.

Patents
K. McCurley and J.B. Lotspiech, System for Encrypting Broadcast Programs in the Presence of Compromised Receiver Devices, Issued as Patent 6650753 in US, 11/18/2003.

N. Megiddo, Smooth End of Auction on the Internet, Issued as Patent 6665649 in US, 12/16/2003.

N. Megiddo and S. Ng, Non-interfering Seek Behavior Modification for Improved Hard Drive Performance, Issued as Patent 6658535 in US, 12/2/2003.

N. Megiddo and X. Zhu, System, Method and Program Product for Software Development, Issued as Patent 6658642 in US, 12/2/2003.

Invited Talks

  • M. Ajtai
    "Random Methods in the Theory of Lattices", Plenary talk at the meeting of the AMS-IMS-SIAM Joint Summer Research Conference on "Integer points in Polyhedra. Geometry, Number Theory, Algebra, Optimization" in Snowbird, Utah, July 2003.
  • R. Krauthgamer
    "The Intrinsic Dimensionality of Graphs", Workshop on Discrete Metric Spaces and their Algorithmic Applications, Princeton University, NJ, August 2003.
  • A. Tomkins
    Tutorial: "Power Laws and Web Models", with Jon Kleinberg, International Mathematics Association Workshop on "Web Data Analysis and Optimization", Minneapolis, Minnesota, July 2003.
    "The Future of Search", Panel moderated by John Markov, New York Times. Panelists: Sergey Brin, Arkady Borkovsky, Scott Banister, Andrew Tomkins. Stanford University, Palo Alto, CA, October 2003.

Presentations and Seminars

  • Z. Bar-Yossef
    "Template Detection via Data Mining and its Applications", Yahoo Inc., Sunnyvale, CA, September 2003.
  • R. Krauthgamer
    "Near Neighbor Search in General Metrics", UC Berkeley, Berkeley, CA, November,2003. MSRI, Berkeley, CA, December, 2003.
  • R. Kumar
    "An Information Statistics Approach to Data Stream and Communication Complexity", NEC Research Institute, August 2003 and Cornell University, October 2003.
    "Lower Bound for Linfinity Approximations in Data Streams", DIMACS Workshop on Discrete Metric Spaces and their Algorithmic Applications, August 2003.
    "Property Testing and Sublinear Time Algorithms", FST-TCS Workshop on Algorithmic Ideas in Massive Data Sets, Mumbai, India, December 2003.
  • D. Sivakumar
    "Lower Bound Techniques for Massive Data Set Computations", FST-TCS Workshop on Algorithmic Ideas in Massive Data Sets, Mumbai, India, December 2003.
    "Models & Matrix Computations for Massive Data Sets", FST-TCS Workshop on Algorithmic Ideas in Massive Data Sets, Mumbai, India, December 2003.
  • A. Tomkins
    "WebFountain and Semantic Tagging", Technical Presentation, Yahoo Inc., Sunnyvale, CA, September 2003.

Committee Work

  • R. Fagin
    Member, Program committee, ICDT 2003. Member, Program committee, Search track of WWW 2004.
  • R. Guha
    Member, Program committee, Search track of WWW 2004. Member, Program committee, Semantic Web track of WWW 2004.
  • R. Kumar
    Member, Program committee, Search track of WWW 2004. Member, Program committee, WebKDD Workshop on Webmining as a Premise to Effective and Intelligent Web Applications 2003. Member, Program committee, Workshop on Algorithms and Models for the Web (WAW 2003).
  • R. Krauthgamer
    Member, Program committee, STOC 2004.
  • N. Megiddo
    Member, INFORMS Frederick W. Lanchester Prize committee.
  • D. Sivakumar
    Workshop on Algorithmic Ideas in Massive Data Sets, FST-TCS 2003. Member, Program committee, FST-TCS 2003. Member, Program committee, Search track of WWW 2004. Member, Program committee, STOC 2004.
  • A. Tomkins
    Member, Program committee, Search track of WWW 2004.

Editorships


External Positions


Miscellaneous

  • R. Fagin Committee for the IBM/APS summer internship program for undergraduate women.
    Member and Organizer, Program committee of the Almaden Institute on Privacy.
  • M. Ruhl Completed a PhD degree at the Laboratory of Computer Science in MIT.

Personnel

Transitions
David Gibson is on leave to work with the WebFountain function at ARC.
Ramanathan Guha transferred to the department from 8CC.
Jim Hafner is on leave to work with the Storage Attachment project at ARC.
Robert Krauthgamer, a graduate of the Weizmann Institute, joined the department as a Research Staff Member in September 2003.
Sridhar Rajagopalan is on leave to work with the Trevi project in the Text Search for Content Management group at ARC.
Matthias Ruhl, a graduate of MIT and the winner of the 2003 IBM Josef Raviv Memorial Postdoctoral fellowship, joined the department as a postdoc in September 2003.
David Williamson completed three years as the department's senior manager in December 2003. He joined the faculty of the School of Operations Research and Industrial Engineering at Cornell University.

Organization
Ron Fagin (Acting Manager) CS Principles and Methodologies
Cheryl Rothrock (Administrative Assistant)
Jim Hafner (On Leave at the Storage Attachment)
Sridhar Rajagopalan (On Leave at the Trevi project)
Larry Stockmeyer (Emeritus)

Andrew Tomkins (Manager) Information Management Principles
Ziv Bar-Yossef
David Gibson (On Leave at the WebFountain function)
Ramanathan Guha
Ravi Kumar
Kevin McCurley

Ron Fagin (Manager) Foundations of Computer Science
Miki Ajtai
T. S. Jayram
Robert Krauthgamer
Nimrod Megiddo
Matthias Ruhl
D. Sivakumar