IBM Research
 

DOM Example

The MegObjectExamplePlugin demonstrates the use of WBI's new MegObject API. MegObjects enable Megs to pass certain kinds of objects rather than simply streamed binary or character data. This ability is used in this example so that all Megs can work on the same Document Object Model (DOM). This eliminates the need for continual parsing and reparsing of the stream.

If you are not yet familiar with WBI's new MegObject feature, have a look at the simple MegObject example.

Try It Out

  1. Setup the Plugin
    • Download the JTidy jar file and add it to your CLASSPATH.
    • Start WBI. Setup your web browser to use WBI as a proxy.
    • Register the MegObjectExamplePlugin. At the WBI console, type (on one line)
         register
            com/ibm/wbi/ examples/dommegobject/domexample.reg
    • Check to see whether the plugin is registered and enabled. Go to the WBI Setup page. The DomExample plugin should be listed in the table with a checkmark next to its name. If the plugin is not listed, try registering it again. If the checkmark is not there, click on the box to the left of the plugin name.
    • Open another browser window. Use that window to try out the plugin, and use this window to display the documentation. (To open another window using Microsoft Internet Explorer, go to File -> New -> Window. To open a window using Netscape Navigator, go to File -> New -> Navigator Window.)
  2. Visit a Web Page
    • Access the WBI Homepage. You will notice that the plugin removes all color information and images and that all tables have become visible.
  3. Having Trouble?


What It Does

The DomExample uses an implementation of MegObject that encapsulates a DOM. Three Megs work on this DOM to transform webpages into "low media" pages.


How It Works

Architecture

The DomExample plugin edits retrieved web pages (removing color information and images) before the pages are shown to the user. The MEGs rely on a simple implementation of MegObject to work together directly through a DOM rather than working from the byte stream.

MEG Model

This plugin sets up three HttpEditors that work on a DomMegObject. To avoid reparsing the stream, their priorities are set to the same value. Although this does not guarantee that no other MEG might interfere, the probability that no MEG will is high. If you write a plugin that uses the DomMegObject, you would want to give your MEGs the same priority as this plugin does. If your plugin does not use the same DomMegObject implementation, you should probably use another priority.

Implementation Details

This plugin relies heavily on the DomMegObject. This is a MegObject that encapsulates a org.w3c.dom.Document.

Of special interest is its public constructor which takes a RequestEvent as an argument. While in the MegObjectExample plugin, each MEG had to examine the RequestEvent in order to get or create the MegObject, this now lies in the responsibility of the DomMegObject. The MEGs now can very easily obtain a DomMegObject.

HTML is no subset of XML and that means that HTML documents are not easily parsed into a DOM. For this example plugin we used the JTidy implementation which does a great job but is still not perfect. If you would like to use a different parser, wrap it into Html2DomParser and call DomMegObject's setParser() method.

DOM to HTML conversion is easier but still some pitfalls apply. XmlWriter is a helper class that converts a DOM into a stream.

As you can see, a MegObject implementation offers three functionalities:

  • Storage of a certain type of data (for example org.w3c.dom.Document)
  • Conversion from at least an InputStream to this type of data
  • And Conversion from this data to at least an OutputStream

The three Editors simply create a DomMegObject from the RequestEvent, recurse through its DOM, change some nodes and put the DomMegObject back into the RequestEvent. Without the DomMegObject, each MEG would have to parse the HTML document by itself and write each of the bytes back out to the stream, a very expensive process.

Some key WBI classes that were used:

com.ibm.wbi.protocol.http
HttpEditor extended by ColorRemover, ImageRemover and TableShower
HttpPlugin extended by DomExample
com.ibm.wbi
RequestEvent used by DomMegObject
MegObject implemented by DomMegObject
org.w3c.dom
Document encapsulated by DomMegObject
org.w3c.tidy
Tidy wrapped by TidyParser


Known Problems

  • There is a very basic problem that occurs when HTML is parsed into an XML DOM: HTML is no proper subset of XML. The parser tries to fix this but is not always successful. JavaScript for example tends to not to work in pages parsed with the DomMegObject.
  • If you should happen to experience any other problem, please inform us!).


Source Files

domexample.reg
Contains the code necessary to register the plugin.
DomExample.java
Contains the class definitions for the plugin and ColorRemover, ImageRemover and TableShower.
DomMegObject.java
Contains the class definition for the DomMegObject.
XmlWriter.java
Contains the class definition for the XmlWriter, used by the DomMegObject.
Html2DomParser.java
A wrapper Interface for parsers.
TidyParser.java
A wrapper class for org.w3c.tidy.Tidy