IBM Research
 

Dictionary Definition Link

Try It Out

  1. Set up the Plugin h
    • Start WBI. Set up your web browser to use WBI as a proxy.
    • Register the Definition Link plugin. At the WBI console, type (on one line)
         register
            com/ibm/wbi/projects/dictionary/dictionary.reg
    • Check to see whether the plugin is registered and enabled. Go to the WBI Setup page. The Definition Link plugin should be listed in the table with a checkmark next to its name. If the plugin is not listed, try registering it again. If the checkmark is not there, click on the box to the left of the plugin name.
    • Open another browser window. Use that window to try out the plugin, and use this window to display the documentation. (To open another window using Microsoft Internet Explorer, go to File -> New -> Window. To open a window using Netscape Navigator, go to File -> New -> Navigator Window.)
  2. Choose which dictionaries to load.

  3. Visit Some Web Pages

  4. Having Trouble?


What It Does

The Dictionary Definition Link plugin scans the document for words that appear in one of its dictionaries, editing the document as appropriate to add a link from the word (or phrase) to its definition on the web. To try distinguish these added links from ones that appear in the original document, the plugin italicizes the anchor text of the links it adds.


How It Works

Architecture

In general, the Dictionary Definition Link plugin performs three functions:

  • editing the HTTP stream to make sure all returned content-type fields are correct (circumventing bugs in some Web servers)
  • editing pages to insert links for known terms
  • generating pages that allow the user to load or unload dictionaries, or choose among dictionaries for a word that appears in several of them.

MEG Model

The AddLinksEditor calls a method in the DefinitionLinkPlugin to find each word's definition URL. In the special case that a word appears in two dictionaries, that method returns a URL that triggers a DictionaryChoiceGenerator, which provides a list of dictionaries that define a word, with links to the definitions and to full descriptions of each dictionary. If the user selects the link for a dictionary's description, the DictionaryChoiceGenerator also parses the resultant query and marks up the description.

If the user goes to the special URL, http://_dictionary/setup, the ControlPanelGenerator creates a form that shows which dictionaries are available, and which of those are loaded, allowing the user to load or unload them by changing and submitting the form. The same generator accepts query data from that form and performs the requested action.

Scenario: Accessing an External Web Page

Typically, then, the processing path is as follows: The user enters a URL or clicks on a link to a Web page. A request is generated and sent to WBI. A WBI generator (possibly the HttpDefaultGenerator) fetches the page from the server, producing an http response. An object representing the response passes through the FixContentTypeEdittor, which might correct the content type for the page, and then to the AddLinksEditor, which changes the contents of the page by adding links from words to their definitions.

Implementation Details

  • The dictionary The dictionary is a hashtable that maps words to definition URLs. We represent these on disk as serialized sequences of key-value pairs (strings) rather than as serialized hashtables so that we don't run into hashtable version conflicts between JDK versions.

    These dictionaries were created by crawling the definition sites, such as the On-Line Medical Dictionary and Duhaime's Law Dictionary picking out words and recording their links. Because many words found in these dictionaries were common English words, we removed those that appeared in a non-specialized dictionary.

  • Editing pages The AddLinksEditor parses each page with an HtmlEditor. For each chunk of text between tags (except for the text of existing links), it picks out words using whitespace and punctuation as delimiters. If a one- or two-word phrase appears in the dictionary, the editor wraps a link around the phrase, generating the new block of hypertext with an HtmlHelper (from the PersonalHistoryPlugin). Before the link, the editor adds an <i> tag, and after the link, it adds an an </i> tag to try to distinguish these inserted links from ones that appear in the original document.

  • Some key WBI classes that were used:

    com.ibm.wbi
    HttpEditor extended by AddLinksEditor
    RequestEvent passed automatically by the WBI proxy to method handleRequest of class AddLinksEditor
    HttpPlugin extended by DefinitionLinkPlugin
    com.ibm.wbi.protocol.http
    DocumentInfo used by AddLinksEditor
    com.ibm.wbi.protocol.http.beans
    FixContentType used by DefinitionLinkPlugin
    com.ibm.wbi.markuplanguage.html
    HtmlEditor used by AddLinksEditor
    HtmlTag used by AddLinksEditor
    HtmlText used by AddLinksEditor


Known Problems

  1. On some systems, the On-Line Medical Dictionary takes a long time to load. The delay comes from the computational overhead of deserializing the strings in the data file to build the hashtable.

  2. Under Linux, the Java interpreter crashes with an OutOfMemoryError.
    The dictionaries take up a good chunk of memory, and the operation of deserializing them takes even more. As of Java 1.2, Linux versions of the JVM do not properly allocate heap memory. Instead, the heap size must be set explicitly using the -mx flag. java -mx50M Run works without problems. (On machines with less than 50 MB of physical memory, the operating system will provide virtual memory as needed.)


Source Files

dictionary.reg
Contains the information necessary to register the plugin.
dictionary.ini
Contains information about the available dictionaries.
AddLinksEditor.java
Contains the class definition for AddLinksEditor, which scans documents for medical terms and adds links to definitions.
DefinitionLinkPlugin.java
Contains the class definition for DefinitionLinkPlugin, the plugin itself.
ControlPanelGenerator.java
Contains the class definition for ControlPanelGenerator, which creates the form for choosing which dictionaries to load.
DictionaryData.java
Contains the class definition for DictionaryData, which represents important data for a dictionary--its name, an HTML description of its contents, and a reference to the dictionary itself (a hashtable mapping words to URLs). It also contains methods for loading and unloading serialized dictionaries.
DictionaryChoiceGenerator.java
This one generates pages for choosing among dictionaries in the case that a word appears in more than one.
MakeChangesGenerator.java
This one takes the query data from ControlPanelGenerator, compares it against the current load status of all the dictionaries, and decides which dictionaries to load and unload. While the expensive part of the loading process, namely the deserialization, is grinding on, the MakeChangesGenerator marks up a web page asking the user to wait. Once the wait is over, the Generator adds a button that lets the user return to the control panel.
omd.data
Contains the data for the On-line Medical Dictionary, represented as a sequence of words that map to definition URLs. This is the full version of the dictionary, minus certain common, non-medical words.
duhaime.data
Contains the data for Duhaime's Law Dictionary.

Acknowledgments

  • The On-line Medical Dictionary is provided as a public service by the CancerWEB Project. We thank Dr. Graham Dark for granting us permission to link to the dictionary.

    The CancerWEB project allows access to the dictionary free of charge in the hope that it will be useful, but without any guarantees of accuracy; for further information on the OMD's use, please read its terms of use.

  • Duhaime's Law Dictionary provides definitions of basic law terms in plain language. We thank Lloyd Duhaime, the lawyer who wrote the dictionary and who publishes it on his Web site as a public service. Duhaime's Law Dictionary is provided in the hopes that it will be useful, but without any guarantee of accuracy of its contents. For further information about Duhaime's law firm and its activities, visit its site.