IBM®
Skip to main content
    United States [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Business Insights Workbench

Enterprises understand that timely, accurate knowledge can improved business performance. Two types of information are central to improving the quantitative and qualitative value of the knowledge available to decision makers: structured and unstructured. It is evident that the analysis of structured and unstructured data will converge over time, which has the potential to redefine the industry's traditional approach to Business Intelligence (BI), search, and information management and analysis. Until now, text analysis has been significantly underutilized. Incorporating it into an information warehouse can enhance the decision-making processes and provide customers with a deeper insight into historical and extrapolated business conditions.

BI has leveraged the functionality, scalability and reliability of modern database management systems to build constantly larger data warehouses and to utilize data mining techniques to extract business advantage from the vast available enterprise data. Unstructured information technologies, while less mature than BI, are now capable of combining today's content management systems and the Web with vastly improved searching and text mining capabilities to derive more value from the explosion of textual information. These systems will blend over time, leveraging techniques from each other and inspiring new approaches that can analyze data and text together seamlessly.

Many industries such as finance, healthcare and life sciences, and customer relationship management can benefit from the integrated analysis of both text and data. Trends in numeric data can reveal, for example, a slowdown in sales in a particular region during a certain time period. Analysis of related documents, such as customer complaints and product reviews, can help determine the cause of the slowdown.

Business Insights Workbench (BIW) is a solution offered by IBM's Almaden Research Center that brings together the structured and unstructured information mining in a single platform. BIW identifies hidden patterns and trends from a wide range of data sources, such as customer interactions and web data. Here are some examples of BIW applications.

  • Intellectual Property Analysis
  • Information Fusion for Investment Intelligence
  • Analytics For market and business processes
  • Customer Relationship Management

BIW's key capabilities fall into two categories:

· Information Analytics

BIW contains a wide range of Information Analytics technologies, including clustering and categorization, interactive taxonomy generation, various forms of relationship analysis including trending, structured fields and unstructured data, etc. In addition, BIW contains a number of annotation capabilities, such as rule-based and dictionary-based annotators for extracting semantic concepts out of text, e.g., chemical annotators, and bio-entity annotators. The core analytics capabilities are summarized below:

  • Search -- the ability to perform searches on unstructured text and metadata
  • Clustering and Categorization -- the ability to automatically create, edit, visualize, and apply a taxonomies.
  • Statistical Analysis -- the ability to find correlations, relationships, trends and interesting relationships hidden in information
  • Visualization -- the ability to view information in an extensible variety of plots and graphs.
  • Fact Extraction/Links -- the ability to link documents together through relationships of terms, phrases, and metadata including people, places, things, and taxonomies.
  • Information Extraction -- automatically pulling semantic information from text.
      1. Dictionary-based annotation
      2. Rule-based and regular expression annotation
      3. Natural Language Processing and Machine Learning based annotation, e.g., Conditional-Random-Field, and Hidden Markov Model
      4. Combinations of the above
  • Snippets Analysis -- analyze a given topic based on analyzing word segments, and sentences.

BIW Analytics is typically done through a sequence of steps. The following figure shows the core steps in BIW-based analysis. Typically, users starts with an explore phase, using the BIW search tool to extract information from a data warehouse. Such exploration may use structured features, annotations, and unstructured text indexes in combination to select the relevant information for the topic of interest.

Next, an understand phase uses a document classification technology (also called taxonomy generation technology) to generate naturally occurring categories from the documents and to classify the selected documents into appropriate categories. The document classification technology uses an interactive clustering of the feature space that helps the domain expert refine the categorization if desired.

Finally, the analyze phase uses a co-occurrence method that compares two taxonomies, or a taxonomy against a feature or structured information, such as comparing one taxonomy against a feature, over time, or against structured information, and allows a detailed category-by-category comparison between two different document sets or two different domain specific conceptual frameworks.

BIW Analysis Process

BIW anlaysis process: Explore, understand and analysis

· Information Warehousing

To enable Information Analytics, it is extremely important to identify the correct data sources, process them into a useable format, and load them into an appropriate information warehouse. Traditional On Line Analytical Processing(OLAP) data warehouse models are designed specifically for housing structured data for Business Intelligence (BI) purposes. To allow an information warehouse to support traditional BI, advanced text mining capabilities, and even the combinations of the two, BIW extends the OLAP data warehouse model to incorporate unstructured data for structured and text analysis. The key BIW information warehousing capabilities include the following:

  • Data Sources -- the ability to access various data sources that might be useful for analytics.
      1. Leveraging crawling technologies, such as IBM WebSphere Information Integrator OmniFind Edition.
      2. Building adaptors to work with different data formats.
      3. Direct accessing of data source via modern Web-Service support if the data source has such support.
  • ETL technologies (Extract, Transform, and Load) -- the ability to work with sources of different data formats, transform and load into a target information warehouse.
      1. The ability to work with a wide range of source data formats.
      2. The ability to work with a wide range of target database/data store.
      3. The ability to generate flexible target information warehouse data models.
      4. The ability to integrate ETL process in a workflow for ease of use

The following figure shows the BIW ETL architecture. BIW works with a wide ranges of data sources and integrates with and leverages IBM WebSphere Information Integrator OmniFind Edition to collect, parse, categorize and index structured and unstructured data. BIW also uses the IBM Unstructured Information Management Architecture (UIMA) toolkit to extract concepts, facts, and relationships from text to help organizations extract insight and value from enterprise content assets.

BIW also integrates with IBM's Web Fountain Semantic Super Computer for crawling, searching, and indexing capabilities. BIW contains an ETL engine to process structured and unstructured data and convert it into a data warehouse. The BIW analytics engine works on top of the data warehouse in order to combine structured and unstructured information analytics.

BIW ETL Architecture

BIW ETL architecture: Extract, Transform and Load

BIW leverages IBM WebSphere Information Integrator OmniFind Edition to collect, parse, categorize and index structured and unstructured data. BIW also uses the IBM Unstructured Information Management Architecture (UIMA) toolkit to extract concepts, facts, and relationships from text, helping organizations extract insight and value from enterprise content assets.


    About IBMPrivacyContact