Panel Chair: Chaitanya K. Baru, San Diego Supercomputer Center
This panel will discuss how advances in database technology, hardware technology, and applications, will impact database system performance.
Current trends in database systems include the incorporation of parallel processing; object-relational capabilities; support for data warehousing, data mining, and OLAP. The typical hardware systems on which such database systems are being implemented include SMP's, MPP's, clusters, 64-bit processors, disk caches, and RAID and other high availability configurations. In addition, as database technology and products have matured, they have been employed in a wide range of traditional business applications including transaction processing, decision support, and OLAP, as well as newer, emerging applications which require, for example, handling of multimedia and spatial data; handling historical data and providing support for tertiary storage; and dealing with multiple, heterogeneous data sources.
As the database, hardware, and application trends continue, and new ones emerge, what will be their impact on the overall performance of database systems. What does the future hold?
Topics to be discussed by this panel include:
In the last few years, major changes have occurred in computing, as a result of the fast adoption of the Web (at least 300,000 sites excluding Intranets in July 1996) and the popularity of Java. One of the side effects has been conferences that attract thousands of attendees which do include some database related topics, e.g. Hyper-G and Java Database Development Toolkit. However, the content of database conferences remains almost unaffected by the Internet. Thus one must assume that database research has not been swayed by the dramatic changes although most commercial database management systems have incorporated some Web interfaces, gateways and Java constructs.
In traditional database management systems, it is generally assumed that responsible people are in charge of naming, creating directories, defining datatypes, indexing, integrating sites, managing applications etc. The conglomerations of data and applications on the Internet (and on Intranets) are often created, maintained in bottom-up fashion, with few people in control. Indeed, this is one of the aspects that made rapid adoption possible. Entities from individuals to whole countries can participate, on almost equal terms as illustrated in July 1993 New Yorker magazine "On the Internet, nobody knows you are a dog". In some cases individuals are in a better position than countries, e.g. with respect to communications bandwidth, as described in Carl Malamud's "Exploring the Internet - A Technical Travelogue".
Aspects of searching, navigation and visualisation have become much more significant than traditional directories for locating people, data and programs. This is due to the very rapid change in the data accessible on Internet and Intranets, the quantity of information and the lack of control. The sites on the Web that contain indexes built by crawlers are very popular, e.g. 14 million accesses per day for AltaVista, in July 1996, and with a high rate of increase. The content of most traditional database management systems is invisible to the Internet and Intranet worlds, because it cannot be found easily.
Internet protocols include provision for caching on the network, e.g. through intermediate proxies and on client systems, thereby improving user response time by reducing the number of accesses to data and programs across the network. The communication protocols are supported by firewall systems (that protect intranets) and incorporate encryption based on public key. Heterogeneous data, e.g. text, structured data, images and programs, can be downloaded by users and programs using simple interfaces that are consistent across platforms. Infrastructures are being devised to manage heterogeneous datatypes and programs, and the links between them. Servers dedicated to specific tasks are evolving, e.g. link management, end user link manipulation, Internet business applications, Java program execution. In a seminar in 1991, Tim Berners-Lee and Robert Cailliau described the fundamental Web constructs:
At the JavaOne conference in May 1996, James Gosling described the history of the Java programming language, and its sudden success when applied to the Web, e.g. the introduction of the idea of Java programs (applets) associated with Web pages. The applet notion is now evolving into a method for distributing client software for client-server systems.
Is it time to create new data management services that embody constructs suited to the current popular environment? If the database community doesn't originate them - someone else will!
Susan Malaika, IBM Hursley, UK, July 1996
The term Data Warehousing is used for database applications with one or more of the following characteristics:
A view is a derived relation defined in terms of base (stored) relations. A view can be materialized by storing the tuples of the view in the database. A materialized view provides fast access to data; the speed difference is critical in applications where the query rate is high and the views are complex or over data in remote databases, so that it is not feasible to recompute the view for every query.
Data warehousing has become increasingly visible as a research issue following in the wake of enormous market activity in the past few years. Warehousing is reputed to be the next big corporate information initiative where every database company hopes to make its fortune. Similarly, materialized views are finding increased research activity, with applications in decision support, OLAP, query optimization, and replication, all of which are relevant for data warehousing.
What new database problems are opened up by data warehousing? Clearly, warehouses need database systems to support larger and larger amounts of data, running into hundreds of gigabytes and tens of terabytes. Large parallel database systems need to be developed. However, are there problems other than those associated with building any large database system. What about issues of database integration, heterogenous systems, database loading, batch processing, data snapshots, backups, aggregate query processing, and OLAP query optimization.
Can materialized view technology provide the answer to most or all of these problems? Many people believe so, and claim that warehousing is no more than a new name for caching and materialized views. Many researchers and industry developers have put their time and money behind this belief and are building systems and products based on materialized views.
Can materialized views technology solve the problems encountered in doing data warehousing using database systems? What work needs to be done in materialized views to develop such technology and to make it usable? Are there significant warehousing problems outside materialized views?