Intermediary-based Transcoding Framework

Steven C. Ihde   Paul P. Maglio   Jörg Meyer   Rob Barrett
IBM Research - Almaden
650 Harry Rd.
San Jose, CA 95120


With the rapid increase in the amount of content on the World Wide Web (WWW), it is now becoming clear that information cannot always be stored in a form that anticipates all of its possible uses. One solution to this problem is to create transcoding intermediaries that convert data from one form to another on demand. Up to now, these transcoders have often been constructed to convert one particular data format to another particular data format (e.g., [5,6]). A more flexible approach is to create reusable transcoding operations that can be composed as needed. We describe a formal framework for document transcoding that is meant to simplify the problem of composing transcoding operations. By specifying the capabilities of each operation in a uniform way, our framework can guarantee that it correctly combines operations to convert arbitrary input formats to arbitrary output formats.


To develop our architecture, we first provide a few definitions. A data object represents content to be transformed (i.e., a sequence of bytes). A type indicates the form in which data are represented (including information about the kind of data object and the way in which bytes are encoded, such as "image/gif"). Properties represent attributes of particular data types (for instance, the type "text/xml" might have the property "DTD", which could take on values such as ""). A format combines a type and a set of properties, such as ("text/xml", (("DTD", "foo"))), indicating that this particular set of bytes (data object) is encoded as "text/xml" (type) with DTD "foo" (property).

Transcoding takes a data object in a format that is convenient for the supplier of the object, and converts it into a data object in a format that is convenient for the consumer of the object. It doesn't matter whether this happens at the supplier, at the consumer, or somewhere in between, such as a specially designed intermediary [1,2]. Intermediaries are particularly well suited to the task, as they can be operated by a neutral third party, or set up by either supplier or consumer to avoid changing to existing systems. To this end, we developed an architecture for intermediary-based transcoding (for an alterntive approach, see [3,4]). Our architecture is modular, allowing developers to separate functionality into well-defined units. Our architecture is pluggable, meaning that units of functionality might be combined in ways not foreseen by its authors to achieve new transformations.

Architecturally, we break a transcoding operation down into several steps:

  1. Individual transcoders advertise their capabilities to a "master transcoder".
  2. Some outside entity makes a request to the master transcoder.
  3. The master transcoder arranges for appropriate individual transcoders to perform the work.
The result of this breakdown is that different individual transcoders can be composed to perform all transcoding operations possible by chaining the individual transcoders.

Advertising Capabilities

Each transcoder enumerates one or more transcoding capabilities. Each capability lists an input format and an output format. For example, a simple transcoder designed to transcode HTML pages into WML for display on a cell phone might advertise a capability this way:

Input FormatOutput Format

A transcoder designed to shrink or expand GIF images might specify this capability:

Input FormatOutput Format
(none)("X", "*")
("Y", "*")

A transcoder designed to help several systems using different XML DTDs to describe the same type of data work together might advertise several capabilities:

Input FormatOutput Format
("DTD", "dtd1")("DTD", "dtd2")

Input FormatOutput Format
("DTD", "dtd2")("DTD", "dtd1")

Requesting Operations

Before the master transcoder receives a request, some party external to the transcoding system will have determined the object's current format and its desired output format. Given a request, the master transcoder examines the capabilities of each transcoder to find a single transcoder or a set of transcoders that can perform the requested operation. Once the master has selected the appropriate set, each is invoked in turn with two inputs: (a) the output of the previous transcoder (or the original input, in the case of the first transcoder selected); and (b) a "transcoder operation", which is a request to perform one or more of the operations advertised in a transcoder's capabilities statement. Every transcoder operation specifies the input format of the object being supplied, and the desired output format of the object to be produced.

To perform its operation, the transcoder should produce an output object of the specified type that is equivalent to the input object, except as specified by the request's output properties. For example, a transcoder that advertised the capability to alter the X and Y dimensions of an image might receive this request:

Input formatOutput format
("X", "50")
("Y", "34")

To transcode a document from one DTD to another, the request might look like:

Input typeOutput Type
("DTD", "dtd2")("DTD", "dtd1")

Selecting Transcoders

The master transcoder's job is to hide the details of transfoming an object from the requestor, thereby letting the requestor concentrate on determining what format the object should be in. Using a formal language to describe the capabilities of each transcoder, the master transcoder can consider each other transcoder without any built-in understanding of the formats involved. The master transcoder need only apply simple pattern-matching rules to find a transcoder that can satisfy the request. In some cases, a request can be satisfied by a single transcoder. In other cases, this is not possible. Here again the formal language used to describe capabilities of each transcoder is a plus, as it enables the operations of different transcoders to be composed by the master transcoder to accomplish operations that were not foreseen by the authors. In essence, the master transcoder tries to find a chain through the pool of available transcoders, matching output formats to input formats, until the request is satisfied (see Figure 1).

transcoder chain
Figure 1: The master transcoder selects an appropriate path through the pool of available transcoders.


Because transcoding is an intermediary application, we built our transcoding framework on top of WBI (see 1,2,7]). In particular, the transcoding framework is implemented as a WBI plugin that consists of the master transcoder and various specific transcoders (such as a GIF-to-JPEG transcoder, or an XML-to-XML converter based on XSL processing). In WBI terms, the master transcoder is a document editor that receives the original object (e.g., GIF) as input and produces a modified object (e.g., JPEG) as output according to some requirements. The master transcoder sits in the data stream between client and server. For each object that flows along this stream, WBI calls the master transcoder so that it may inspect the request and the original object to make an appropriate response. If transcoding is necessary, the master transcoder determines the appropriate transcoder or combination of transcoders. The master transcoder arranges for the appropriate transcoders to be subsequently called in the correct order.


  1. Barrett, R. & Maglio, P. P. (1999). Intermediaries: An approach to manipulating information streams. IBM Systems Journal, 38, 629-641.

  2. Barrett, R. & Maglio, P. P. (1998). Intermediaries: New places for manipulating and producing web content. Computer Networks and ISDN Systems, 30, 509-518.

  3. Fox, A. & Brewer, E. A. (1996). Reducing WWW latency and bandwidth requirements by real-time distillation. In Proceedings of the Fifth International World Wide Web Conference (WWW5).

  4. Fox, A., Gribble, S.D., Chawathe, Y.,& Brewer, E.A. (1998). Adapting to network and client variation using active proxies: Lessons and perspectives. IEEE Personal Communications.

  5. Smith, J.R., Mohan, R. & Li, C. (1998). Transcoding internet content for heterogeneous client devices. In Proceedings of IEEE Conference on Circuits and Systems (ISCAS).

  6. Tudor, P.N. & Werner, O.H. (1997). Real-time transcoding of MPEG-2 video bit streams. In IEEE Conference Publication of International Broadcasting Convention 1997, pp.286-301.

  7. WBI Programming Tutorial. Available as