|
ALTAVISTA,
COMPAQ AND IBM RESEARCHERS CREATE WORLD’S LARGEST, MOST ACCURATE
PICTURE OF THE WEB
"Bow
Tie" Theory Shows the Web is Not as Connected as Previously
Thought
SAN JOSE,
PALO ALTO & SAN MATEO, Calif. -- May 11, 2000 -- Scientists
from IBM Research, Compaq Corporate Research Laboratories
and AltaVista Company have completed the first comprehensive
"map" of the World Wide Web, and uncovered divisive boundaries
between regions of the Internet that can make navigation difficult
or, in some cases, impossible.
Previous
studies, based on small samplings of the Web, suggested that
there was a high degree of connectivity between sites as evidenced
by recent reports on the "small world Web" and 19 degrees
of separation. Contrary to those preliminary findings, the
new study -- based on analysis of more than 500 million pages
-- found that the World Wide Web is fundamentally divided
into four large regions, each containing approximately the
same number of pages. The findings further indicate that there
are massive constellations of Web sites that are inaccessible
by links, the most common route of travel between sites for
Web surfers. Developing the "Bow Tie" Theory explained the
dynamic behavior of the Web, and yielded insights into the
complex organization of the Web.
These
discoveries will help computer scientists better understand
the structure of the Internet, and lead to new technologies
and design advances that will speed and simplify e-business.
"Bow
Tie" Theory Explains the Four Regions of the Web
The
image of the Web that emerged through the research was that
of a bow tie. Four distinct regions make up approximately
90% of the Web (the bow tie), with approximately 10% of the
Web completely disconnected from the entire bow tie.
The "strongly-connected
core" (the knot of the bow tie) contains about one-third
of all Web sites. Web surfers can easily travel between these
sites via hyperlinks, this large "connected core" is at the
heart of the Web.
One side
of the bow contains "origination" pages, constituting
almost one-quarter of the Web. "Origination" pages
are pages that allow users to eventually reach the connected
core, but cannot be reached from it. The other side of the
bow contains "termination" page, constituting approximately
almost one-quarter of the Web. "Termination" pages
can be accessed from connected core, but do not link back
to it. The fourth and final region contains "disconnected"
pages, constituting approximately one fifth of the Web. Disconnected
pages can be connected to origination and/or termination pages
but are not accessible to or from the connected core.
Impact
of the Study
With the Bow Tie Theory, and its new explanation of the structure
of Internet, the scientific and business communities will
now be able to:
- Design
more effective Web crawling strategies. Crawling then
indexing is the fundamental method employed by search engines
to organize the Internet. To achieve more complete coverage,
AltaVista and other search engines will be able to develop
more advanced crawl strategies to capture more of the Web.
-
- Increase
the effectiveness of e-commerce. Through the design
of more effective browsing, advertising, measuring and modeling,
E-commerce sites may decide to use different strategies
for attracting surfers from various regions. For example,
an "origination site" will have to increase its efforts
to be easily found by Web crawlers. Once the site is linked
to the connected core, its strategy may then shift to other
traffic-generating measures.
-
- Analyze
the behavior of Web algorithms that make use of link information.
Because many search engines use link information in ranking
algorithms, they become targets for link "spamming"
intended to create an artificial increase in a site's linkage.
-
- Predict
and capitalize upon the continued evolution of the Web.
The researchers believe that the Bow Tie structure will
be maintained as the Web grows. While some pages may evolve
into the connected core, new pages will continue to be created
in all three other regions.
-
- Create
mathematical models for the Web. With these findings,
researchers can now develop new models to study the growth
of the Web and possibly predict the emergence of new, yet
unexplored phenomena on the Web.
This
study - the largest ever to be conducted on the topography
of the Web - is part of an ongoing, collaborative project
by AltaVista, Compaq and IBM. The researchers expect to update
the study on a regular basis from collected data using AltaVista’s
search engine and advanced connectivity server software with
Compaq AlphaServer system containing 16 gigabytes of RAM,
enough to hold the entire Web map in memory. IBM Research
analyzed the data and contributed to the development of the
"Bow Tie" Theory.
The initial
findings will be presented simultaneously at the 9th International
World Wide Web Conference, Amsterdam (May 15-19) and at the
ACM PODS 2000 Conference, Dallas (May 14-19).
Visit
the following link to retrieve the "Web Map/Bow Tie Theory"
conference paper (posted after May 14): http://www9.org/papers/papers.html/
(members of the press community can request an advance copy
of the conference paper by contacting the press contacts at
the companies).
AltaVista
Company
AltaVista Company (www.altavista.com)
is the premier media and commerce network offering Internet
users search, shopping and up-to-the-minute news, live video,
content, and community resources. The company integrates unique
technology, products and services to deliver relevant results
faster for both individuals and businesses. AltaVista is building
on its heritage of technology and innovative leadership, offering
informative services including AltaVista Search, AltaVista
Shopping.com, AltaVista Live! personalized portal, and AltaVista
Free Internet Access combined with the microportal. AltaVista
is a majority-owned operating company of CMGI, Inc. (Nasdaq:
CMGI), Andover, MA. AltaVista is headquartered in Palo Alto,
Calif.
Compaq
Computer Corporation
Compaq Computer Corporation, a Fortune Global 100 company,
is one of the largest suppliers of computing systems in the
world. Compaq designs, develops, manufactures and markets
hardware, software, solutions, and services, including industry-leading
enterprise computing solutions, fault-tolerant business-critical
solutions, and communications products, commercial desktop
and portable products, and consumer PCs. Compaq products and
services are sold in more than 200 countries directly to businesses,
through a network of authorized Compaq marketing partners,
and directly to businesses and consumers through Compaq's
e-commerce Web site at http://www.compaq.com.
Compaq markets its products and services primarily to customers
from the business, home, government, and education sectors.
Customer support and information about Compaq and its products
and services are available at http://www.compaq.com.
IBM
Research
For more information on IBM Research, go to http://www.research.ibm.com
|