To develop our architecture, we first provide a few definitions. A data object represents content to be transformed (i.e., a sequence of bytes). A type indicates the form in which data are represented (including information about the kind of data object and the way in which bytes are encoded, such as "image/gif"). Properties represent attributes of particular data types (for instance, the type "text/xml" might have the property "DTD", which could take on values such as "http://www.w3.org/TR/1999/PR-xhtml1-19990824"). A format combines a type and a set of properties, such as ("text/xml", (("DTD", "foo"))), indicating that this particular set of bytes (data object) is encoded as "text/xml" (type) with DTD "foo" (property).
Transcoding takes a data object in a format that is convenient for the supplier of the object, and converts it into a data object in a format that is convenient for the consumer of the object. It doesn't matter whether this happens at the supplier, at the consumer, or somewhere in between, such as a specially designed intermediary [1,2]. Intermediaries are particularly well suited to the task, as they can be operated by a neutral third party, or set up by either supplier or consumer to avoid changing to existing systems. To this end, we developed an architecture for intermediary-based transcoding (for an alterntive approach, see [3,4]). Our architecture is modular, allowing developers to separate functionality into well-defined units. Our architecture is pluggable, meaning that units of functionality might be combined in ways not foreseen by its authors to achieve new transformations.
Architecturally, we break a transcoding operation down into several steps:
Each transcoder enumerates one or more transcoding capabilities. Each capability lists an input format and an output format. For example, a simple transcoder designed to transcode HTML pages into WML for display on a cell phone might advertise a capability this way:
| Input Format | Output Format |
|---|---|
| text/html | text/wml |
| (none) | (none) |
A transcoder designed to shrink or expand GIF images might specify this capability:
| Input Format | Output Format |
|---|---|
| image/gif | image/gif |
| (none) | ("X", "*") ("Y", "*") |
A transcoder designed to help several systems using different XML DTDs to describe the same type of data work together might advertise several capabilities:
| Input Format | Output Format |
|---|---|
| text/xml | text/xml |
| ("DTD", "dtd1") | ("DTD", "dtd2") |
| Input Format | Output Format |
|---|---|
| text/xml | text/xml |
| ("DTD", "dtd2") | ("DTD", "dtd1") |
Before the master transcoder receives a request, some party external to the transcoding system will have determined the object's current format and its desired output format. Given a request, the master transcoder examines the capabilities of each transcoder to find a single transcoder or a set of transcoders that can perform the requested operation. Once the master has selected the appropriate set, each is invoked in turn with two inputs: (a) the output of the previous transcoder (or the original input, in the case of the first transcoder selected); and (b) a "transcoder operation", which is a request to perform one or more of the operations advertised in a transcoder's capabilities statement. Every transcoder operation specifies the input format of the object being supplied, and the desired output format of the object to be produced.
To perform its operation, the transcoder should produce an output object of the specified type that is equivalent to the input object, except as specified by the request's output properties. For example, a transcoder that advertised the capability to alter the X and Y dimensions of an image might receive this request:
| Input format | Output format |
|---|---|
| image/gif | image/gif |
| ("X", "50") ("Y", "34") |
To transcode a document from one DTD to another, the request might look like:
| Input type | Output Type |
|---|---|
| text/xml | text/xml |
| ("DTD", "dtd2") | ("DTD", "dtd1") |
The master transcoder's job is to hide the details of transfoming an object from the requestor, thereby letting the requestor concentrate on determining what format the object should be in. Using a formal language to describe the capabilities of each transcoder, the master transcoder can consider each other transcoder without any built-in understanding of the formats involved. The master transcoder need only apply simple pattern-matching rules to find a transcoder that can satisfy the request. In some cases, a request can be satisfied by a single transcoder. In other cases, this is not possible. Here again the formal language used to describe capabilities of each transcoder is a plus, as it enables the operations of different transcoders to be composed by the master transcoder to accomplish operations that were not foreseen by the authors. In essence, the master transcoder tries to find a chain through the pool of available transcoders, matching output formats to input formats, until the request is satisfied (see Figure 1).
Because transcoding is an intermediary application, we built our transcoding framework on top of WBI (see 1,2,7]). In particular, the transcoding framework is implemented as a WBI plugin that consists of the master transcoder and various specific transcoders (such as a GIF-to-JPEG transcoder, or an XML-to-XML converter based on XSL processing). In WBI terms, the master transcoder is a document editor that receives the original object (e.g., GIF) as input and produces a modified object (e.g., JPEG) as output according to some requirements. The master transcoder sits in the data stream between client and server. For each object that flows along this stream, WBI calls the master transcoder so that it may inspect the request and the original object to make an appropriate response. If transcoding is necessary, the master transcoder determines the appropriate transcoder or combination of transcoders. The master transcoder arranges for the appropriate transcoders to be subsequently called in the correct order.