An HPTS Position Paper by
Tobin J. Lehman
IBM Research - Almaden
650 Harry Road (K55/801), San Jose, CA-95120
- HPTS Position Paper -
April 1, 1997
The world of business is growing more frantic and complex; decision time is shorter, the number of factors contributing to most business deals is larger, and the number of information sources is greater. Having the proper information at hand is the single-most important factor in any major business decision. Thus, the key for dealing with today's fast-paced and increasingly complex business world is building and maintaining an understanding of all parts of the business. However, this knowledge is not acquired easily; its acquisition is achieved by locating and managing the proper information. And, unfortunately, the trend for information management has gone from fully accessible centralized data to less accessible distributed data.
Information, the lifeblood of an enterprise, was once the sole responsibility of mainframes. Company data, in the form of rigid tabular, textual information, resided in a closely guarded centralized data center. Data processing applications could be directed to access business information residing in files and database systems in a central computing complex. In a sense, this was the golden age of data processing, as enterprise data would never again be so simple and so easy to manage -- although this may not have been the view of a user, since the tools for accessing data were primitive and clumsy.
The arrival of the client/server model changed the organization of the data from one of a centralized system to one of a distributed system. With that came problems. Although some amount of company data became more accessible to the average worker -- it was no longer guarded by the ``glass house'' guards -- the total collection of company data became less accessible to any one person or company agency because there was less than full connectivity to all of the data. Much of the enterprise data was deposited onto either standalone PCs or PCs in disconnected networks. These were the dark times of information management.
In the last few years we have witnessed the new age of connectivity to information. Extra-nets have allowed individuals access to data both around their intranets and around the world. And, the semi connected world of client server is evolving into the fully connected world of the peer to peer model. Coincident with this connectivity is a new emergent technology, which brings an entirely new family of computing devices into the connected world. The old model of three tiers of computing (tier 3 is mainframes, tier 2 is servers, and tier 1 is desktop machines) is getting a new tier -- tier 0. Tier 0 represents those devices that are more lighter/smaller/mobile than traditional desktop machines, such as PDAs and smart cell phones.
We are on the brink of a new era in computing. The cost of an embedded computer (single chip computers comprising CPU, RAM, I/O ports and wireless communication) will soon be low enough ($10 range) to embed in most major (and some minor) home, automotive and business appliances. In addition, computing devices (PDAs) [USR Pilot, Apple Newton] and mobile devices (pagers, phones) [Nokia, Motorola] will merge, adopting features from each other to create a family of wireless computing products. A Nokia phone (the 9000 communicator) [Nokia 9000], for example, already supports most features of a PDA. Similarly, the popular USR Palm Pilot (PDA) [USR Pilot] has numerous wireless modem options already. One could expect that a future model of the Pilot will have pager and telephony capabilities.
At the first HPTS workshop in 1985, there were many discussions of transactions that involved buying an airline ticket or making a bank withdrawal. These classical transactions were served by very large centralized complexes running high performance transaction processing systems like IMS Fastpath (now IMS/ESA [IMS]), ACP (now TPF [TPF]) and SABRE [Sabre]. In the last 12 years, a significant change has taken place. Now, in this new information age, transactions are no longer debit/credit operations on a centralized store, they can be any durable exchange of information between two (or more) parties. And, rather than directly involving people, as was often previously the case, these newer transactions may well act completely autonomously. Acting on behalf of a user, but not necessarily under the user's direct control, a wireless USR Pilot (a person's mobile platform) may exchange data with home appliances, with the car, with other PDAs or with public institutions, such as stores, banks or agencies. Each of these interactions may involve a complete range of data types, from basic types like numbers, text, audio, and video to more complex types like applets and encapsulated types. In addition, the diversity of peers that our Palm Pilot talks to can also be large.
What would we expect of the database system that ran on a tier 0 device? As an example, let's take our favorite mobile platform, the Palm Pilot. In a single day, our Pilot could be exchanging data (executing transactions) with several hundred different systems using a multitude of datatypes. The Pilot would be an information (push) client, our interface to the home/car/store/shopping mall/office, and the general database for all personal information. One of the common users for the Pilot would be to connect to devices (vending, pay phones, ATMs, etc) to deliver a key for access. And, although the Pilot might not retain large amounts of data (more than tens of megabytes), a large amount of data could conceivably pass through it. Today's Palm Pilot Professional comes with no disk memory (though 1 inch drives could be common in the future) and 1mb of RAM (upgradable to 2mb, or even possibly 4mb with after-market upgrades).
What database system will we run on our Pilot? Or, more appropriately, what are the requirements for a Tier Zero Database Management System (TZDB)?
We believe that the standard Relational or Object-Relational Database systems: DB2, Oracle, Sybase, Informix and SQL Server, and the Object Database systems: ObjectStore, Versant and Objectivity do not (and will not) satisfy the above criteria. So, if not the name brand database systems, then what?
At the IBM Research - Almaden, we have a project called TSpaces. TSpaces is a lightweight, network oriented database system that is tuned to the needs of tier 0 devices. The spirit of TSpaces comes from the Tuplespaces used in the LINDA Project [Gelertner 85, Ahuja 86, Carriero 89]. In the mid-1980's, David Gelernter, a professor at Yale University, created the LINDA programming language, which was designed to address the communication problems of parallel programming. Part of this project was a concept known as Tuplespaces, which were designed to simplify data exchange between parts of a parallel program. TupleSpaces embodied three main principles:
Although the LINDA TupleSpace was originally intended as a global communication buffer for parallel programs (which are concerned mostly with synchronization and high performance), we found that it also works for distributed programs which often need a ubiquitous persistent communication buffer. The central concept of a LINDA Tuplespace was surprisingly simple -- agents post unstructured tuples to a universally visible Tuplespace, consume tuples, and read tuples. LINDA was a bit hit in the parallel programming community [too many references to mention], but it has not enjoyed much visibility outside of that particular research area. More recently, SUN Microsystems has renewed interest in Tuplespaces, calling it Javaspaces [Javaspaces].
We use TSpaces to simplify data exchange between peer to peer components in a generalized network, where the data itself is very dynamic. TSpaces is able to handle dynamically changing data because it does not use (or require) a static data schema definition. And, because it is written in Java, the data transport between all platforms is standard.
TSpaces is a simple data manager that manages self-describing ``tuples''. A tuple is an ordered set of type/data pairs. Applications sharing information agree on the general structure of the data, though there is the ability for dynamic additions of new data and new types. Operations are somewhat simpler than a database system. Though the essence of the operations are the same: insert, delete, and query, the queries do not have the complexity of the SQL language. This is considered a plus for these applications that mostly move and store data, rather than execute elaborate queries over it.
The TSpaces model works well across a wide variety of applications. Consider a simple embedded environment in the home, where a number of computers embedded in appliances communicate via TupleSpace to exchange status information with each other, with the central home security system, with the homeowners Pilot, or with anyone who cares. Similarly, today's cars have an average of 30 on-board computers (the advanced cars have up to 60) -- most of them not sharing data. With the automotive industry trend changing to put devices on a common data bus and powerline, these devices will be able to communicate with each other or with the central automotive security system (which then communicates to the driver's PDA). All information exchange can be done via a TupleSpace system.
Consider the Pilot that fits in your shirt pocket. You need the latest map information, corresponding to your current location, downloaded into it. Do you need to know the IP address or the URL of the map server? No. You just need to issue the request, ``Map needed for the following coordinates'' to the mobile TupleSpace server. The map server, listening for query requests, responds, putting the answer back into TupleSpace. The answer may be in the same format that it was in last week, or it may now contain new audio data that correspond to various interest points on the map. The TupleSpace is able to serve up map data to your Pilot, along with the new audio handling methods, if needed.
Our experience with TSpaces so far is that it is lightweight and flexible, yet it is powerful enough to serve as a real information store. It is a reasonable system to manage information in your home, your car, your PDA and your PC. It can easily interface with more sophisticated stores, such as a fully functional relational database system, to satisfy the information needs of any client. In fact, a TupleSpace works quite well as an interface to a database system, where the initial query and the answer set are both transmitted via tuples.
The small mobile or embedded computers (PDAs, sensors, pay phones, vending machines, home appliances, office equipment, etc), often referred to as tier 0 devices, represent a major change in how our computing infrastructure will function. Rather than use the desktop model of directed point to point communication over a LAN, tier 0 device communication will be autonomous and often anonymous. In addition, the data management needs of these new devices will not be heavily query oriented (no need for a heavy-weight query language like SQL), but will instead be more information transaction oriented. A new system, TSpaces, can connect the tier 0 devices, satisfy their data management needs, while keeping a connection to the tier 1, 2 and 3 information stores.
Many corporations will be incorporating tier 0 devices into their computing networks, as these devices are often the information generators upon which the company's business data is based. It is essential for the livelihood of the business that the data can flow from the tier 0 devices all the way back to the corporate data warehouse, where it can be analyzed with modern data mining techniques. The TSpaces tier 0 database provides the data conduit to connect tier 0 devices to each other and to the main business computer networks.