| Past Almaden Project |
|
The QUEST group is working on a variety of projects including research on Data Mining
(especially classification and clustering) information management for e-commerce including
privacy issues, P2P applications (specifically personal utilities) and local area distributed
computation structures. |
Associations
Given a database of transactions, where each transaction consists of a set of items,
discover all associations such that the presence of one set of items in a transaction
implies the presence of another set of items.
``30% of people who buy diapers also buy beer.''
Synthetic Data Generation Code
|
Classification
Given examples of objects belonging to different groups, develop profile of each group
in terms of attributes of the objects. This profile is then used to predict the group
of a new object.
``Buyers of expensive sport cars are typically young urban professionals whereas
luxury sedans are bought by elderly wealthy persons.''
Synthetic Data Generation Code
|
Sequential Patterns
Given a database of transactions over a period of time, find inter-transaction
patterns such that the presence of a set of items is followed by another set of items.
``10% of people with diabetes develop a treatable loss in eyesight.''
|
Similar Time Sequences
Given a database of time sequences, find sequences similar to a given one, or find
all occurrences of similar sequences.
``The closing net asset value of the Harbor International mutual fund has been
similar to that of Ivy International and Scudder Global Fund.''
|
Application Examples
A direct mailer wants to maximize cross-selling opportunites. By applying the
Associations and Sequential Patterns technique to historical order data, the direct
mailer can find out what articles sell together and what articles are bought in a
sequence over time. The mailer uses this information to decide on placements of articles
in the catalog and for deciding what flyers to attach with a bill.
A retailer wants to optimize purchasing and store-keeping. By applying the Similar
Time Sequences technique, the retailer can find groups of products that have similar
forecasted seasonal sales for next year and use this information for combining purchases
and inventory replenishment.
A bank wants to assess the credit-worthiness of its customers. By analyzing the
loan-history records with the classification technique, the bank gets a precise
profile of high, medium, and low-risk customers.
An auto insurer wants to study lapsing and retention among their customers. By applying
the Sequential Patterns technique, the insurer can understand what events lead to lapses.
A medical insurer is interested in detecting insurance fraud. By applying the
associations technique, the insurer can determine if there is a ring of providers
indulging in ping-ponging of patients between them.
The above is only a sampling of the many cross-industry tasks that can be enhanced by
using these technologies. More than one technique can be applied to an industry and
more than one industry can benefit from a technique.
|