Introduction

    This diploma thesis describes how to manage, negotiate, and transfer personal information on the World Wide Web (WWW, or Web). This chapter provides a brief overview of the Web and how personal information is used in online transactions. It also describes some of the current issues and problems related to privacy on the Web. The end of this chapter explains the goals of this thesis and gives an overview of the remainder of this document.

Using the World Wide Web

    As the popularity of the Web increases, the Web will continue to evolve from a means of providing an easy way of accessing (and publishing) information on the Internet to a virtual marketplace where everything can be bought or sold, just like in the physical world. As in the early days of the Web, Web browsers are still used to retrieve and display information from Web servers. Nowadays though, Web browsers can also be used to purchase goods or send electronic mail (email). This way of using the Web is becoming more and more popular. A recent survey [eMarketer98] 1 shows that the Web is currently used by approximately 36 million people worldwide and that this number is expected to increase to 142 million by the year 2002. Though the estimated total online consumer revenues for Electronic Commerce on the Web (eCommerce) were $1.5 billion in 1997, this pales in comparison to sales in the home shopping, catalog and retail industries. Currently, the growth of eCommerce cannot keep up with the growth of the Web itself. The reasons why people do not purchase on the Web vary, but two major issues are privacy and security [GTRC98] .

Personal Information and Privacy

"Privacy: ... 1. the state of being private; retirement or seclusion. 2. freedom from the intrusion of others in one's private life or affairs: the right of privacy. 3. secrecy ..." [Webster97]

    When Web sites started publishing information in the early 90s, the exchange of information was uni-directional. People browsed the Web and received information anonymously, with no obvious threats to a person's privacy. Since then, traffic on the Web has changed substantially and has become a give-and-take process. With the emergence of online services and eCommerce, many Web sites now keep track of their visitors and collect information about them for various reasons. These reasons include personalizing Web sites for different customers, helping with online sales or services and tracking demographics, among others. Web sites often require a visitor to register on his first visit. On subsequent visits, the same Web sites generally ask that user to login using his registration information or use cookies [1] 2 to re-identify a visitor.

    When a person enters into an online transaction he is usually asked to give out personal information. When releasing personal information over the Internet, several threats to privacy exist (unless the user provides false information):

  1. Eavesdroppers could watch transactions and extract personal information.
  2. Web sites could misuse information about a person, including giving or selling it to others who may misuse it.
  3. Unrelated third parties could have access to a person's information if it is stored with insufficient protection by the Web site.
  4. The existence of these threats demonstrates the need for a mechanism to protect users' privacy when browsing the Web.

Security

    Some may argue that security could provide privacy protection because security is privacy. However, there is a difference between security and privacy:

"... Information is secure if the owner of information can control that information. Information is private if the subject of information can control that information. Anonymous information has no subject, and thus ensures that information is private. Anonymity requires security and guarantees privacy, but is neither. ..." [Camp97]

    In order to ensure privacy, there needs to be security. The market already provides several security tools, such as the Secure Socket Layer (SSL) [2] protocol developed by Netscape. Another example is Pretty Good Privacy (PGP 3 ) [3] . Both use Rivest-Shamir-Adleman (RSA) public key cryptography. Such security tools can help protect privacy by preventing access to the information for non-authorized parties. But privacy requires more than that. There also need to be ways of controlling the access to and the distribution of information. The following example illustrates clearly why privacy requires more than security:

    Person B orders a book at an online book store by filling out a form, with his name, address, and his credit card information. If this information is sent using an SSL connection between B's computer and the online book store, the information will be perfectly secure during the transaction. Nobody can spy on the connection or alter the information during the transaction; it can be read only by the online book store. The online book store will then use this information to finish the transaction which may include the release of parts of B's information to a third party (who actually ships the book to B) over a secure connection.

    If the third party in the above mentioned sample also sells this information to advertisers or other companies, then B's privacy might be violated even though there was plenty of security during the transaction.

Current Problems

    Besides privacy problems regarding legal protection [Camp97] , there are several other types of problems. During online transactions Web sites can gather a lot of information, which can be either personal information or information derived by tracking people's online activities. People are concerned about the privacy of such information [Wang98] because it is often difficult for them to learn about a Web site's information practices. Some Web sites have started publishing their privacy policies online but in a lot of cases people cannot find them, do not trust them, or simply do not understand them. Thus, people often do not know the consequences of releasing personal information. Survey results indicate that this leads to irritation and mistrust. According to [GTRC98] , 39.1% of the respondents do not believe that their credit card information will be secure when released online and 26.9% do not believe that personal information will be kept private. In order to find out about users' experiences with such privacy policies, a small survey (see Appendix A on page 89 ) was conducted for this thesis. The survey respondents criticized several aspects of such privacy policy statements.

  1. Privacy policy statements are either too long or too complicated, or even both.
  2. People were missing a standard for privacy policies, preferring that they all be the same across the Web.
  3. These findings agree with the statement that, as computers are used for more tasks and are integrated with more services, people will need help with the information and work overload [Maes97] .

Online Transactions

    In addition to the problems regarding privacy policy statements, the survey respondents criticized several other aspects of releasing personal information in online transactions. Respondents mentioned that it is often unclear why and for what purpose a Web site collects personal information during an online transaction. For example, it is not obvious why a Web site wants to collect a person's phone number while offering a subscription to a mailing list. One reason is that information such as this is very valuable to Web sites, especially to those who offer free Web services. The collected information can be used for advertising or marketing. Fifty percent of the respondents of our survey would like to give out only the minimum amount of information to a Web site. All of these problems indicate that people need help and better protection regarding privacy on the Web.

Motivation and Goals

    The question now is how to provide such help. Information about the privacy practices of Web sites is needed as well as an infrastructure to get to it. The World Wide Web Consortium (W3C) [4] is currently working on this problem with its Platform for Privacy Preferences Project (P3P) [5] . P3P provides a framework for informed online interactions. Its goal is to enable Web users to exercise preferences over the use of their personal information. P3P-compliant applications inform users about Web sites' privacy practices and allow them to delegate decisions and tasks to their computer agents. Such tasks could include the automated transfer of personal information during an online transaction. This is supported by the P3P protocol. The W3C believes that P3P can help increase people's confidence in online transactions by presenting them useful and understandable information about Web sites' privacy practices [Reagle99] . Parallel to P3P, the W3C has a project called A P3P Preference Exchange Language (APPEL) [6] . This language can be used to express a person's preferences. Appendix B on page 99 and Appendix C on page 109 at the end of this document provide short introductions to P3P and APPEL respectively.

    Thus, it will soon be possible to automatically inform users about a Web site's privacy policy. We decided to implement P3P and APPEL in order to demonstrate how these technologies can help protect a person's privacy when browsing the Web and performing online transactions. This basic implementation would allow a user to

  1. automatically obtain Web sites' privacy policies,
  2. automatically evaluate and check these policies against his personal preferences,
  3. and receive assistance during online transactions such as computer-issued warnings or recommendations, or allow the computer to finish the transaction on behalf of his.
  4. Now, in real life there is more to business transactions than just saying yes or no and providing information. People negotiate contracts with special terms and conditions. In order to find out how this behavior can be implemented in online transactions, our survey included the following question (see also Section A.6 on page 96 ):

Assume you could use a system that can automatically obtain Web sites' privacy policies and check and evaluate them against your personal needs and preferences. How would you configure such a system , i.e., what would your preferences look like regarding the release of personal information?

    The two most common answers were, to only give out the minimum set of information that is needed for the transaction and to only accept a transaction if the Web site promises not to resell the user's information. Otherwise, the system should abort the transaction or warn the user. Keeping in mind these preferences and the desired functionality of our basic implementation we envisioned a software agent, 4 the Online Privacy Agent (OPA), that can negotiate the terms and conditions of online transactions. In addition to this, an OPA would be able to keep track of online transactions and their respective terms and conditions.

Usage Scenarios

    The users of an OPA can benefit from it in multiple ways. The following scenarios illustrate some of these benefits.

Seamless Transfer of Information

    One of the features of using P3P in combination with APPEL is the automated transfer of personal information. Many online transactions require the input of the same information, such as email address, name, and home address. Using an OPA, the terms and conditions of the transactions can be checked against the user's preferences. If this check fails, the OPA will notify the user with a warning that participation in the current transaction can violate his privacy. On the other hand, if the check succeeds, then the OPA can go ahead and seamlessly send the required information back to the Web site. This would lessen the burden on users to type in the same information multiple times. However, it is still necessary to give the user control over the release of information. (Some people might feel uncomfortable with the fact that a piece of software is giving out personal information.) The following chapters will describe how both of these goals can be accomplished with the Online Privacy Agent.

Negotiating Terms and Conditions of a Transaction

    In many online transactions Web sites ask for personal information. As mentioned earlier, the kind of information requested is not always relevant to the transaction itself. With the emergence of P3P and APPEL, we believe that soon a Web site might want to grant a visitor access to its Web resources based on the amount of information it can get from the visitor. Another example would be a Web site that offers discounts on purchases in its online store. The Web site might offer higher discounts if it is allowed to sell users' address information to advertisers. In both cases, an OPA would be a helpful assistant to automatically negotiate the terms and conditions for a user when registering with or purchasing goods from Web sites. The OPA would apply the user's preferences in the online transaction and try to negotiate an agreement in one or multiple rounds talking to the Web site.

    For example, an OPA will more likely abort a transaction if its user is more reserved regarding the release of information online. On the other hand, if the user is more liberal about the release of information, the OPA will more likely give out as much information as necessary in order to successfully finish as many transactions as possible. The negotiation terms are related to the amount of information to be released or the conditions of the current transactions. The advantage of OPA-enabled negotiation is that it frees the user from having to read privacy policy statements for different transactions.

Transaction History

    When performing online transactions, there is currently no automated way of keeping track of the online transactions in which a user participated (registering with an airline Web site, purchasing a book). In some cases, the user might get a confirmation number at the end of the transaction, a user identification and password pair, or a confirming email. The user currently has to store this information in a notebook or write it down in his organizer in case he needs it later. The online transaction information ends up in many places whereas it should be stored in one place where it is easy to find. An OPA can help keep track of such information and store it together with the terms and conditions of the transaction. This information can be used and transferred seamlessly in subsequent transactions with Web sites. A good example is a subsequent visit to a Web site which requires a user password. With an OPA, there is no need for the user to remember the identification password pair for a particular Web site. This will be especially useful with the increasing number of Web sites requiring user identification.

Thesis Overview

    This thesis illustrates and describes the Online Privacy Agent (OPA), a software agent to manage, negotiate and transfer personal information on the Web. As described in the previous sections, an OPA covers various aspects of privacy and personal information in online transactions. This thesis will focus on the aspects of negotiation and transfer of personal information using P3P and APPEL and discuss them in detail. Other aspects such as tracking online transactions will be discussed briefly throughout this document.

    Chapter 2 , Agent Architecture , describes OPA's architecture and its components. It also illustrates the OPA's usage and its current implementation. At the end of Chapter 2 , a short overview of agent technology is given, including a comparison of the OPA to existing agent technology.

    Chapter 3 , Management and Transfer of Personal Information , provides an overview of the OPA's context, and briefly introduces the Hypertext Transfer Protocol (HTTP), P3P, and APPEL. Furthermore, Chapter 3 explains how the OPA can be used to manage a user's personal information and to keep track of online transactions. This document then goes into finer detail and describes explicitly, with a functional model, how the OPA monitors, manipulates and transfers information.

    Chapter 4 , Negotiation of Personal Information , describes the OPA's ability to negotiate during online transactions. We will describe the concept of negotiating 5 personal information by introducing a framework for the automated negotiation of sets of information. The chapter closes with a description of how the framework was applied in the context of online transactions by the means of P3P and APPEL.

    Chapter 5 , Conclusion , provides a summary of this document and describes possibilities for future work.

    In each of the chapters, this document refers to the OPA's current implementation. We describe the OPA's usage ( Chapter 2 ) and its architecture ( Chapter 2 and 3), as well as the implementation of individual components ( Chapter 4 ). Several appendices at the end of this thesis provide summaries, brief introductions and overviews about various aspects of this thesis. See Appendix A for details about the survey which was conducted as a part of this thesis to find out about Web users' experiences with privacy and personal information in online transactions. A brief introduction to P3P can be found in Appendix B . Appendix C provides an overview of and information about APPEL. Appendix D illustrates a sample APPEL ruleset.


1. Throughout this document, books, papers, articles, or magazines are referenced by the authors last name or the company's name, and the year of publication in brackets (see Bibliography on page 123 ).

2. Throughout this document, Web resources are referenced by numbers in brackets (see World Wide Web References on page 125 ).

3. Although PGP's name suggests that it is a privacy tool, it really is a security tool.

4. The term agent is a commonly used term in computer science, although there is no consensus definition for it. [Nwana96] and [Bradshaw97] provide overviews of the types and characteristics of existing agent technologies.

5. The aspect of (multi-round) negotiation was recently removed from the current P3P specification because the W3C considered it to be too complex and to be a reason for Web sites not to deploy P3P. However, this document will show that multi-round negotiation is useful to the user (and the service), and can be implemented with reasonable amounts of effort.


April 9, 1999 · Jörg Meyer · jmeyer@almaden.ibm.com