IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

IBM Research - Almaden

Tesla: On-Demand Information Systems


 Autonomic Index Re-Organization and Index Evolution

Traditional indexing schemes work by partitioning the search space (data or queries) hierarchically, according to some pre-chosen scheme. For example, a balanced tree index might choose to partition the data according to numerical attribute a, with fanout f. A distributed hash table would partition the data items according to their hash values. This partitioning is reflected in two key aspects of an index’s topology:

  • bounding predicates of each index node
  • intra-node links between index nodes (if the bounding predicates are disjoint the intra-node links are deterministic).

The performance of the index on any search request is highly dependent on these two parameters. For the same fan-out, narrow bounding predicates result in quick searches that use small number of hops whereas wide bounding predicates result in slow searches. Likewise if multiple index nodes index the same region they will have lesser load (and hence propagate search requests faster) than if a single index node indexes the same region.

In this project, we present a new bottom-up index construction and adaptation mechanism based on a microeconomic model. Unlike traditional tree indexes or more recent distributed hash tables, our index does not specify the bounding predicates (advertisement) of the index nodes, or the intra-node links, through any centralized scheme. Instead, each index node can choose its bounding predicate arbitrarily.

The index is built as a network of distributed index nodes called Data Location Brokers (DLBs). Each DLB specializes in a subset of the objects, defined by a set of predicates that it advertises – this advertisement is a promise to search for all objects satisfying these predicates. Each DLB contains a list of pointers to other DLBs – each pointer contains the advertisement of the DLB being pointed to (these pointers may be for overlapping ranges). A DLB satisfies search requests (also called queries) using these pointers, by recursively passing on the requests to other DLBs. As the base-case of this recursion, some DLBs have direct links to data objects satisfying the predicates they advertise.

Autonomic Index Image

The picture above shows an example index for a hospital database. DLB2 promises to search for all patient records with age less than 32 so it can accept queries like age in [0,24], age in [16,17], and age=18, but not for a query like age in [17,47]. DLB2 satisfies these request using pointers to DLB5 and DLB6. Note that the advertisement of a DLB is only to constrain the search requests that are forwarded to it by other DLBs. A query to the distributed index can originate at any DLB. So such a query coming into a DLB may be on a predicate that is not contained within that DLB’s advertisement. Therefore each DLB also has a special pointer to a root node DLB0 which advertises the TRUE predicate. This pointer serves as a catch-all to handle any query that cannot be handled by a pointer to a regular DLB. This root node DLB0 need not be a real DLB. Instead it is a concise representation of a broadcast-based exhaustive search.

Incentive Mechanism: Each node is incentivized to index efficiently by being paid a credit each time it routes a search request. Each node independently chooses and continually adjusts both its links and advertisement to maximize its local profits. Unlike prior work that caches links to recently accessed data objects (i.e. the index “leaves”), each DLB continually adjusts its pointers to intermediate nodes (DLBs) of the index. We introduce a dynamic programming algorithm for doing this pointer restructuring at each DLB, and a load-analysis mechanism to help each DLB evolve its advertised bounding predicate. Both of these operations are selfish, local optimizations that each DLB performs to maximize its profit.

Experimental analysis indicates that such local optimization leads to the evolution of an index that meets global goals of efficient searches and load balancing across index nodes. Especially, we find that a collection of index nodes that start from scratch with no inter-node links can evolve into a fat-tree like structure with logarithmic search depth for most requests and load-balancing across tree levels. Second, we show that when the search distribution is skewed and/or varying, our index restructures itself to finely index the hot data items and coarsely index the remainder, effectively forming an adaptive version of a partial index

For Further Details

  • Autonomic Index Evolution. Unpublished manuscript. Also patent application ARC9-2004-0043, 2004.
 Related Projects

    About IBMPrivacyContact