IBM Research
 

Page Filter

Try It Out

To see what the PageFilter plugin does, you should view some web pages twice: first with the plugin disabled, and then with the plugin enabled.

  1. Setup the Plugin
    • Start WBI. Setup your web browser to use WBI as a proxy.
    • Register the PageFilter plugin. At the WBI console, type (on one line)
         register
            com/ibm/wbi/examples
               /pagefilter/pagefilter.reg
    • Check whether the plugin is registered and enabled. Go to the WBI Setup page. The PageFilter plugin should be listed in the table with a checkmark next to its name. If the plugin is not listed, try registering it again. If the checkmark is not there, click on the box to the left of the plugin name.
    • Open another browser window. Use that window to try out the plugin, and use this window to display the documentation. (To open another window using Microsoft Internet Explorer, go to File -> New -> Window. To open a window using Netscape Navigator, go to File -> New -> Navigator Window.)

  2. View Web Pages Without the Plugin
    • Now disable the plugin so that you can see what the original web pages look like. Go to the WBI Setup page. Disable the PageFilter plugin by clicking on the box to the left of its name.
    • Access the IBM Homepage, the IBM Support Homepage, and the WBI Homepage. Notice that they all get displayed in the browser and contain working links.

  3. View Web Pages With the Plugin
    • Now enable the plugin so that you can see how it changes the pages that are displayed in your browser. Go to the WBI Setup page. Enable the PageFilter plugin by clicking on the box to the left of its name.
    • Access the IBM Homepage. Notice that it appears the same way as before. Now access the IBM Support Homepage. Notice that some of the links that were visible before have been replaced by normal (non-link) text. Finally, try to access the WBI Homepage. This should take you to a page of links, rather than to the WBI Homepage.


What It Does

The PageFilter plugin blocks a web browser from displaying sites that are not on a list of approved sites. When a user tries to access an unapproved site, the browser displays a web page that contains links to all of the approved sites. If a web page contains a link to an unapproved site, the link text is replaced by normal (non-link) text. A web form can be used to add to the list of approved sites.


How It Works

Architecture

The plugin maintains a database of approved web sites/pages. New sites/pages can be added to the database by using a web form. Each time the browser makes a request, the database is consulted as to whether the requested page is approved. To be approved, the requested URL must match at least one of the entries in the database. If the request is approved, then the page is retrieved and is displayed to the user. If the request is not approved, then the browser is diverted to a web page that contains a list approved links.

When an approved page is retrieved from the web, each link is checked to determine whether it points to an approved page. If it points to an approved page, then the link is not edited. If not, then the link is removed (i.e., the anchor tag is removed but the anchor text remains).

MEG Model

The PageFilter plugin consists of two generators and two editors. The following diagram illustrates the way a request gets processed by the PageFilter plugin. In the diagram, "RE" refers to the one request editor, "G" refers to either of the two generators, and "DE" refers to the one document editor.

WBI processing path

  • Step 1 (RE): When the browser makes a request, the request editor checks to see whether the requested web page is in the database of approved pages. If the requested page is in the database, then the request editor does not change the request. If the page is not in the database, then the request editor changes the URL in the request.
  • Step 2 (G): The PageFilter plugin uses two generators, one for each of the cases in Step 1 (request approved vs. request not approved). Requests for approved pages are handled by WBI's default generator. This generator acts as a transparent proxy, retrieving the requested page from its server. Requests for unapproved pages are handled by a special pagefilter generator. This generator creates a web page containing links to approved sites.
  • Step 3 (DE): Once a page has been received from a generator, a document editor checks whether the page contains any links to unapproved sites. If there are any such links, the document editor replaces these links with normal text. Once the document editor is done, the page is ready to be displayed in the browser. Note that the document editor does not actually need to be run on pages that come from the pagefilter generator, as the links on these pages always point to approved sites.

Implementation Details

  • Filtering out unapproved pages: The PageRequestEditor (a request editor) checks whether the requested page is approved or not. If it is approved, then the PageRequestEditor throws a RequestRejectedException. If not, a new request is created (with a different URL) which will make the browser display a web page of links to approved pages/sites.

  • Editing links: The PageFilterEditor (a document editor) changes approved pages by editing out links to unapproved web pages/sites. A LinkAnnotationEditor is used to remove links to unapproved sites. More precisely, PageFilterEditor extends LinkAnnotationEditor. This means that each link (i.e., each anchor tag and its anchor text) is passed to the editLinkmethod. If the href of the tag matches a pattern in the database (the link points to an approved site), then editLink does nothing. Otherwise, editLink sets the link tag to null, indicating that the link is to be removed. Note that only the anchor tag is removed in this case, and not the anchor text.

  • Retrieving approved pages: WBI's DefaultHttpGenerator retrieves approved web pages.

  • Creating page of approved links: The PageFilterGenerator handles requests which were created by the PageRequestEditor. The generator creates a String of HTML code containing links to approved sites. Then, it uses a StaticHtmlGenerator to display the HTML code in a web page.

  • Storing approved pages/sites: Each approved web page or web site is represented by a Page object. A Page contains a pattern to match URLs against and a sample URL and title. The sample is used to generate the page of links to approved sites. The plugin maintains a small database of these Page objects.

  • Changing the "approved" database: The PageFilterFormGenerator deals with the web form to add Page objects to the "approved" database. The generator can create the form and process any data that was entered into the form. The database is actually an instance of Section.

  • Some key WBI classes that were used:

    com.ibm.wbi.protocol.http.beans
    StaticHtmlGenerator used by PageFilterGenerator and PageFilterFormGenerator
    FixContentTypeEditor make sure that content-type in http header matches actual content type of page
    FormHelper helps extract query data from web forms
    LinkAnnotationEditor extended by PageFilterEditor (edit out unapproved links)
    NewUrlRequestEditor used by PageRequestEditor
    com.ibm.wbi.protocol.http
    HttpGenerator extended by PageFilterGenerator and PageFilterFormGenerator
    HttpRequestEditor extended by PageRequestEditor
    com.ibm.wbi.util
    Section used by the PageFilter plugin itself (to keep track of Pages)


    Source Files

    Page.java
    Contains the class definition for Page (data structure for approved page).
    pagefilter.ini
    Initial database of approved sites/pages.
    PageFilter.java
    Contains the class definitions for PageFilter (the plugin itself), PageFilterGenerator (page of links to approved pages), PageFilterFormGenerator (form to change list of approved sites), PageFilterEditor (edit out links to unapproved sites), and PageRequestEditor (change request URL if destined for an unapproved site).
    pagefilter.reg
    Contains the code necessary to register the plugin.
    Pattern.java
    Contains the class definitions for Pattern and PatternPart (data structures used to match requested URLs against entries in approved database).