  As the searcher marks the relevance of these documents, the FIRE engine analyzes them in the background, looking for patterns of words which indicate relevant and trash documents. Using a greedy search, this engine generates an optimized boolean query which
  1. retrieves the relevant documents
  2. avoids the trash documents
  3. uses a moderate set of words.

The third goal gives the engine a better chance of generating useful words rather than statistical anomalies in the text to differentiate between good and bad documents. The engine does not seek the minimum query which optimizes the search results, but seeks a broader set of words. This goal helps to stimulate the searcher's understanding of the query and improve query results.

graphic of Suggested Query Terms

The results of the FIRE engine's analysis is displayed unobtrusively as a set of "Suggested Query Terms". This display takes the same form as the initial query, for ease of comprehension by the searcher.

In this example, after marking 5 documents, the FIRE engine has concluded that the query should be narrowed by requiring documents to also include at least one of the following search terms: ibms, microsoft, operating, or industry.

As the searcher thinks about the problem of differentiating product announcements from reviews of products, this input from FIRE is very useful. It has observed that the reviews of OS/2 and Windows NT tend to mention the companies that have produced those systems (IBM and Microsoft). Product announcements for systems which run on these platforms rarely mention the companies that make the operating systems. The other two suggestions may not be entirely clear at first glance, so the searcher asks for an explanation of the word operating.


 graphic of suggestion explanation


The system then responds with display of the term in context, also indicating whether it appears in relevant or trash documents by color-coding. The context immediately makes clear that operating is part of the phrase operating system. Now we realize that reviews of OS/2 and Windows NT tend to use the phrase operating system in the review, while product announcements do not.

Finally, the searcher is confused by the suggestion of the term industry. By again asking for an explanation, FIRE shows it in context and we realize that its presence is not due to any pattern, but is an anomaly of the particular documents that were rated.

The searcher concludes that all of the suggestions are appropriate except for industry and decides to refine the query.

