Skip to main content

Multimodal Mining for Healthcare Group

AALIM Cardiac Clinical Decision Support

Diagnostic decision support is still very much an art for physicians in their practices today due to lack of quantitative tools. With integrated information becoming available through large patient repositories, clinicians, patients and payers would like to go beyond a qualitative viewing of multimodal clinical data to a more quantitative analysis of information using all supporting clinical and imaging data. For example, physicians can now expect to correlate diagnostic information across patients with a similar condition to infer the effectiveness of a drug. Similarly, with large patient repositories, physicians can validate their diagnosis by seeing the agreement of their diagnosis with those of other physicians in the world who looked at similar cases. This can improve the practice of medicine allowing physicians to exploit the experience of other physicians in treating similar cases, and inferring their prognosis and the outcome of their treatments. In general, such statistically-guided decision support can allow physicians to see consensus opinions as well as differing alternatives, helping reduce the uncertainty associated with diagnosis. In the long run, this could lower diagnostic errors and improve the quality of care.

The IBM AALIM project pioneers a new direction in evidence based medicine (EBM). AALIM is a decision support system for cardiology that exploits the consensus opinions of other physicians who have looked at similar patients, to present statistical reports summarizing possible diagnoses. The key idea behind our statistical decision support system is the search for similar patients based on the underlying multimodal data. By finding similar patients (based on their multimodal exam data) from pre-diagnosed datasets, we believe that the similarity in the diagnosis (and hence treatment and outcomes) can be inferred for a new patient.

Traditionally, evidence-based medicine has been delivered through practice guidelines and literature surveys that were manually assembled by clinicians as the best collective knowledge on a medical topic. While that is a valuable resource, an equally important source of comparative effectiveness information is the set of pre-diagnosed patient records already existing in an electronic medical record system. With AALIM's new approach, millions of patient records can be searched and a cohort of similar cases can be assembled on-the-fly customized per patient. Statistical analysis of these similar cases, their diagnoses, treatments, medications and outcomes can be instantaneously derived. This new approach has tremendous potential. It shortens the time from introduction of new treatments to the collection of evidence and presentation to the practitioners in an on-demand fashion. In future it could become a very powerful tool in the hands of physicians and medical staff world-wide.

The AALIM work has been currently focused on the domain of cardiology where a number of cardiac modalities have been analyzed for disease-specific information. A common framework of exam-based similarity search is used to identify related exams, their corresponding patients and their underlying disease distributions.

Finding similar patients using multimodal fusion

A general algorithm for fusing similarity information derived from multiple information sources was developed to demonstrate multimodal fusion in the AALIM system. The search results based on individual modalities (eg. EKG, echo videos, textual reports) return ranked lists of matching data elements. Since each matching data element can arise from different time periods within a single patient's clinical history, this time line was used for merging the results of similarity searches. For this, a normalized similarity distribution was derived from each modality match over a patient's time line. From the individual similarity distributions, a cumulative distribution of similarity is formed . The resulting overall similarity distribution is then thresholded to record the intervals along a patient line that are potentially good matches. A ranked list of normalized score time intervals is then generated. The distribution of disease labels, and drug levels within those time intervals is then plotted and a ranked list of peaks are reported as patient-specific co-morbidity associations.

The resulting selection of patients results in a meaningful distribution of diseases. Consider a query patient whose major health problems recorded were hyperlipidemia and coronary atherosclerosis, and a common existing condition of essential hypertension. Using the patient's EKG waveforms and cardiac echo along with the textual reports, the AALIM clinical decision support system searches for similar exams and hence similar patients. The multimodal fusion algorithm is used to merge the results from individual modality matches. The disease labels associated with the patients are then unblinded and their distribution is now shown in the figure below. As can be seen, the existing conditions of the query patient have been validated (shown in light blue). In addition, associated co-morbidities are flagged for MI, congestive heart failure and diabetes are also flagged by the decision support algorithm.


Cardiac Modality Analysis

The domain of cardiology is rich in multimodal imaging data. We analyze a variety of diagnostic exam modalities used on patients with cardiac diseases ranging from the most common, i.e. EKG, to the most specialized, eg. cardiac MRI. In each modality analysis, we look for disease-specific cues and how they manifest within the modality. For example in echo videos, we may look for septal wall motion to characterize hypokinesia. Similarly, we may look for characteristic shape patterns for bundle branch block in EKGs.

Echo Video Analysis

An echocardiogram depicts spatio-temporal motion patterns of various cardiac chambers and valves (see video below). It is a commonly used diagnostic imaging to identify many heart diseases including valvular abnormalities, hypertrophy, hypokinesia, etc. Our work on echo video analysis involves sophisticated analytics for end-to-end processing of complete cardiac echo studies including clip segmentation, view recognition, LV localization and automatic estimation of ejection fraction. In addition, by continuously tracking the left ventricle through the heart cycle, we have made possible the measurement of volumetric trajectories from echo videos, a measurement that was possible only with more expensive imaging modalities such as cardiac MR.

Disease-specific understanding of echocardiographic sequences requires accurate characterization of spatio-temporal motion patterns. We have been working on methods for automatic extraction and matching of spatio-temporal patterns from cardiac echo videos. Our first approach used a simple idea of capturing holistic heart motion through average velocity curves and finding patterns that separate normal vs. diseased heart (see our Link to Content in pdf formatACM-MM05 paper for details.)

Our next approach was by segmenting cardiac chambers using a motion-based normalized cut segmentation algorithm. We then registered motion patterns of corresponding regions using active shape models. Volumetric trajectories and other measures were extracted from corresponding regions for separating normal vs. diseases heart motion. This method is described in our Link to Content in pdf format MICCAI07 paper.

Recently we have looked at direct disease recognition from cardiac chamber size and motion using active shape models for modeling the cardiac chambers. This is described in our viewpoints in echo videos

Modeling EKGs as Curves

An electrocardiogram (EKG) is an important and commonly used diagnostic aid in cardiovascular disease diagnosis.

Many disturbances in the heart function show as characteristic variations in the sinus rhythm waveform of Fig. 1b, and can be used as cues to diagnose the disease. Fig. 1c shows such a modification in the ECG due to premature ventricular contraction, where the heart skips a beat only to beat very strongly in the next, causing a missed R segment. Physicians routinely make diagnosis by a simple visual examination of this ECG waveform. It is common knowledge to physicians that patients with the same disease have similar-looking ECG shape in the relevant channels.

The traditional way to describe an EKG has been to divide it into ad hoc segments called the P-Q-R-S-T segments as shown in the figure. For diseases cases, these segments are sometimes missing or additional elements are present. For example, the P wave is inverted, the R segment may breakup into R and R', etc. In our approach, we try to capture the fundamental intuition of perceptual shape used by physicians by representing the EKG as a curve per channel. The important fiducial features are then extracted as places where there is a significant change in curvature. The resulting description looks as shown below. With this descriptor we can cover various diseases and can describe the variational nature of these features within disease classes without resorting to the ad hoc segmentation into P-Q-R-S-T segments. By bringing this shape descriptor to EKG signals, we introduce a number of techniques from shape matching of curves from computer vision to the matching of EKG signals.

Heart Sound Analysis

Heart auscultation, i.e., listening to the sounds produced by the heart, is a common practice in the screening of heart disease. Although different diseases produce characteristic sounds, forming a diagnosis based on sounds heard through a stethoscope is a skill that takes years to perfect. Auditory discrimination of heart sounds is inherently difficult as these sounds are faint and lie at the lower end of the audible frequency range. For this work, our main intuition was that if we could form visual representations of heart sounds, they can actually be more discriminatory than auditory renderings. Figure below shows examples of heart sounds that can be distinguished based on their visual appearance.

Cardiac MRI

Cardiac MRI is a specialized diagnostic modality primarily used to characterize the structure and function of the heart, valves, major vessels and specifically for myocardial infarction patients and patients with coronary artery disease. We have recently looked at discriminating between patients with Acute MI and normal patients by automatically analyzing Bull's eye representations of cardiac regions. A report on this work will be made available shortly.

Coronary Angiograms

We have recently begun the investigation of coronary angiograms in collaboration with cardiologists at the Catheter lab of medical center in San Francisco. More on this work will be made available soon.

Echo Doppler imaging

We have been looking at the echo Doppler images captured in an echo study to study various cases of stenosis and regurgitation in the four heart valves (aortic, mitral, tricuspid, semilunar). Details on this work will be provided soon.

Cardiac reports

A wealth of information lies in exam reports dictated by physicians. In cardiology, these are captured in EKG, echo and catheter reports. In general, extracting feature phrases from conversational text regions would require natural language understanding. Fortunately, medical reports often contain technical jargon mixed with disease indicators in the textual descriptions of condition, so that it is possible to do a "shallow parsing" by looking for specific indicators based on top-down domain knowledge. In our work, we are looking at the extraction of disease labels, diagnostic parameters, treatments and outcomes from reports. We find disease labels in reports by searching for occurrence of names of cardiac-related diseases available from our reference disease collection. Variants of these phrases and abbreviations will be retained based on domain knowledge. For example, mitral regurgitation and mitral insufficiency are variants, as are left ventricular hypertrophy and LVH. If any of these terms are found inside phrases in the above phrasal list, then they are retained as potential phrasal candidates. These candidates are further filtered for presence of modifiers and negative indicators using a pre-built indicator list of terms. This helps remove phrases such as "No evidence of mitral regurgitation seen" or "Mitral regurgitation was non-existent" or flag cases of "severe atrial regurgitation" in the example below.

Cardiac Imaging Search and Retrieval

Leveraging our background in content-based retrieval technologies, we have now addressed a number of cardiac modalities from a search perspective. The general problem we address for each modality can be stated as follows. Given a database of disease-labeled modality data (eg. heart sound database, echo video database), and a query modality data (eg. query echo video from a patient), we try to find similar modality data and build a distribution of associated disease labels. Doing this will help in clinical decision support in two ways. First, it will help validate a disease label of a query modality data (current diagnosis for a patient). Secondly, it can help flag associated diseases (co-morbidities) or even alternate interpretations for which there is evidence from the given data. To do this in a meaningful manner requires carefully modeling the within class variations of a disease across patients as well as finding discriminatory measures for various disease classes. The fact that a patient may have multiple diseases simultaneously makes this problem all the more challenging.

Echo Video Search

Cardiac echo videos are an important source of diagnostic information in cardiac decision support. During an echocardiographic exam, the operator makes important measurements such as area of the left ventricle, the velocity of the Doppler flow jet, and mitral valve area which appear in video frames as text feature-value pairs as shown below. We have found as many as 146 measurements that are relevant for various cardiac diseases. Our approach of finding similar cardiac echos uses these measurements to build disease models. Some of this work is described in our Link to Content in pdf formatICDAR09 paper.

EKG Search

The EKGs (in corresponding channel) have a similarity in visual appearance per disease class. The figure below shows the similarity in appearance of EKGs within the bundle branch block class. Our approach to EKG similarity search is to capture this perceptual similarity in appearance through a computational algorithm. This can help retrieve patients that have similar disease as the current patient and aid a physician in diagnostic decision support. In our work, we capture the disease-specific variations in the EKG shape using a constrained non-rigid translation transform. The transform is recovered using a variant of dynamic shape warping. The warping distance is also used to form the shape similarity metric for retrieval. Details are in our Link to Content in pdf formatEMBC'07 paper

Heart Sound Search

In this work, we propose the idea of finding similar heart sounds from a database. To develop a fast matching algorithm, we approximate the heart sounds by their envelop curves as shown in the figure below. A candidate matching heart sound is found in the database by a nonrigid warping of the time and spatial dimensions for the pairs of envelop curves as shown in the figure below. By matching the envelop shape signatures, we are able to retrieve heart sounds of matching diseases. More details of this work are in our Link to Content in pdf formatECCV08 paper

Medical Image Segmentation and Registration

Members of the group have experience in segmentation and registration of general medical images including brain imaging. We now describe the various approaches to segmentation and registration that we have taken over the years.

Patch-based Segmentation

Patches are commonly used for classification of objects. In recent work, patches are used for learning specific object boundary - thus supporting an object segmentation task. In Link to Content in pdf formata recent paper published in ISBI 2009, a procedure is shown for automatic extraction and segmentation of a class-specific object (or region) by learning class-specific boundaries. The method is presented and evaluated with a specific focus on the detection of lesion regions in uterine cervix images. The watershed segmentation map of the input image is modeled using an MRF. The local pair-wise factors on the arcs of the watershed map indicate whether the arc is part of the object boundary. Final lesion region segmentation is obtained using a loopy belief propagation applied to the watershed arc-level MRF. Experimental results on real data show state-of-the-art segmentation results in this very challenging task.

Simultaneous segmentation and registration of MR images using EM

Many neuroscience applications require the identification of structures with weakly visible boundaries in Magnetic Resonance (MR) images. We developed a statistical model combining the registration of an atlas with the segmentation of magnetic resonance images. Unlike other voxel-based classification methods, this framework models these problems as a single Maximum A Posteriori estimation problem, where the registration is defined by an object-specific affine mapping representation. The examples below illustrate different models whose solution is determined via instances of the Expectation-Maximization (EM) algorithm.

A study empirically demonstrates the utility of simultaneously performing segmentation and registration over addressing these tasks sequentially. For more information, please read our Link to Content in pdf formatNeuroImage'06 journal paper. The algorithm is based on a probabilistic model with a prior defined by a statistical shape atlas. The atlas is built through Principal Component Analysis (PCA) on a set of LogOdds, which captures covariant shape deformations of neighboring structures. Structure boundaries, anatomical labels, and image inhomogeneities are estimated simultaneously within an Expectation-Maximization formulation. For more information, please read our Link to Content in pdf formatIPMI'07

Model-based segmentation

In Medical Imaging applications, segmentation can be a daunting task due to possibly large inhomogeneities in image intensities across an image e.g., in MR images. These inhomogeneities combined with volume averaging during the imaging and possible lack of precisely defined shape boundaries for certain anatomical structures complicates the segmentation problem immensely. One possible solution for such situations is atlas-based segmentation.

In this framework, we proposed a Link to Content in pdf formatunified variational principle that will simultaneously register the atlas shape (contour/surface) to the novel brain image and segment the desired shape (contour/surface) in the novel image. In this work, the atlas serves in the segmentation process as a prior and the registration of this prior to the novel brain scan will assist in segmenting it. Another key feature/strength of our proposed registration+segmentation scheme is that it accommodates for image pairs having very distinct intensity distributions as in multimodality data sets.

Entropic Measures for Multiple Point-set Registration

Registering multiple point sets is a problem that often occurs in the registration of MR and CT images. Due to the complexity of the problem, it is desirable to avoid the detailed search for corresponding features under a non-rigid motion setting. Our approach fits into the general class of approaches that avoid explicit point correspondences for non-rigid alignment through the use of divergence measures between probability distributions formed around point sets. Specifically, we have demonstrated the use of several divergence measures, ranging from Link to Content in pdf formatJensen-Shannon (JS),Link to Content in pdf formatJensen-Renyi (JR) to Link to Content in pdf formatGeneralized L2 divergence to measure the cohesiveness between the density functions for obtaining the non-rigid deformation. The density-based approaches are relatively more robust to the shapes of different sizes and to the presence of missing features. Furthermore, if an unbiased information theoretic measure is chosen to quantify the multiple densities representing the shapes, the matching results can potentially be unbiased to any of the given point-sets.

Content-based Medical Image Retrieval

Members of this group have been working on general medical image retrieval prior to coming into the field of cardiology imaging. Here we report on a sampling of this work relating to generalized patch-based framework for retrieval of medical images and Gaussian mixture models.

GMM-KL Framework for X-ray image retrieval

The GMM-KL framework provides an automatic image-to-image matching scheme that combines localized and continuous image representation via Gaussian mixture modeling (GMM), along with information theoretic image matching via the Kullback-Leibler (KL) measure. The GMM-KL framework applied to X-ray imagery in our Link to Content in pdf formatTITB (IEEE Transactions on Information Technology in Biomedicine) paper

Patch-based Framework for matching

In the last several years, "patch-based" representations and "bag-of-features" classification techniques have been proposed for general object recognition tasks. In these approaches, a shift is made from the pixel entity to a "patch" - a small window centered on the pixel. We have been taking a patch-based approach to the segmentation of medical images. A very large set of patches are extracted from an image. Each small patch shows a localized "glimpse" at the image content; the collection of thousands and more such patches, randomly selected, have the capability to identify the entire image content (similar to a puzzle being formed from its pieces). A dictionary of words is learned over a large collection of patches, extracted from a large set of images. Once a global dictionary is learned, each image is represented as a collection of words (also known as a "bag of words", or "bag of features"), using an indexed histogram over the defined words. The matching between images, or between an image and an image class, can then be defined as a distance measure between the representative histograms. In categorizing an image as belonging to a certain image class, well-known classifiers, such as the k- nearest neighbor and support-vector machines (SVM), are used.

Work on classification of large radiograph archives, using local patch representation of the image content and a bag-of features approach for defining image categories, with a kernel based SVM classifier, was recently Link to Content in pdf formatpublished in the ISBI conference

In an international competition CLEF2009, the system was ranked FIRST place as in discriminating orientation and body regions in x-ray images.

Brain Imaging Research

Although the current focus of our group is on cardiac imaging, members of the group have considerable experience in brain imaging research. We now highlight some of the work done by group members in this area.

Computerized Tools for Brain MRI Segmentation and Image Analysis in Multiple Sclerosis

MRI is the major imaging technique for diagnosis and monitoring disease activity in the brain. One of the diseases monitored using MR is multiple sclerosis (MS), a common non-traumatic neurological disease in young adults. Statistical modeling tools have been developed to advance automated MR image segmentation and analysis. Novel algorithms were developed for 3D segmentation of brain tissues, detection of MS lesions and tracking of the lesion load over time. A major effort was conducted to enable multimodal data analysis, including the fusion of information across several MRI sequences (T1, T2, PD, Flair). Statistical models were developed to automatically and adaptively characterize the data. These include parametric pattern-recognition methods, known as Link to Content in pdf formatGaussian-mixture models (GMM) and non-parametric methods, such as Link to Content in pdf formatmean-shift modeling.

Simultaneous Registration & Segmentation of Anatomical Structures from Brain MRI

In Medical Imaging applications, segmentation can be a daunting task due to possibly large inhomogeneities in image intensities across an image e.g., in MR images. These inhomogeneities combined with volume averaging during the imaging and possible lack of precisely defined shape boundaries for certain anatomical structures complicates the segmentation problem immensely. One possible solution for such situations is atlas-based segmentation.

In this framework, we proposed a Link to Content in pdf formatunified variational principlethat will simultaneously register the atlas shape (contour/surface) to the novel brain image and segment the desired shape (contour/surface) in the novel image. In this work, the atlas serves in the segmentation process as a prior and the registration of this prior to the novel brain scan will assist in segmenting it. Another key feature/strength of our proposed registration+segmentation scheme is that it accommodates for image pairs having very distinct intensity distributions as in multimodality data sets.

Brain MRI Segmentation using Logarithm Odds Maps

Neuroscience studies frequently use geometric models of brain structures for the detection of morphological differences between patient groups. For example, in a recent review of the literature, the most robust MR findings in schizophrenia are: enlarged lateral ventricles (77% of studies); medial temporal lobe (amygdala-hippocampal complex and/or parahippocampal gyrus) volume reduction (77% of studies); and gray matter volume reduction of the superior temporal gyrus (100% of studies). These models are often based on manual segmentations made by experts who outline the structures of interest in MR images. Generally, this process is not only time consuming, but inter- and intra-operator variability negatively impacts the reliability of such studies. We address this issue by developing fully automatic, hierarchical segmentation algorithm that robustly parcellate images generated through standard clinical MRI acquisition into neuroanatomical structures. Our algorithms are specifically designed to overcome the relatively low signal to noise ratio often present in MR images. The publications from this work have resulted in numerous awards such as the Medical Image Analaysis-MICCAI06 Best Paper Prize.

Diffusion tensor imaging of White Matter Tractographies

Recently, it was demonstrated that MS is diffuse in the brain and characterized by microstructural changes in the normal appearing white matter (NAWM) and the grey matter (GM). Magnetic resonance imaging (MRI) depicts brain abnormalities in 90% of MS patients and is currently being used for diagnosis. The application of quantitative image analysis techniques to brain MRI is of importance to predict MS disease course, and to evaluate new drug efficacy in clinical trials. However, the diffuse changes in NAWM are not detected by conventional MRI. Thus, experts in MS suggest that conventional MRI, although very important, is not sufficient, requiring additional advanced MRI modalities, such as Diffusion Tensor Imaging (DTI), and the support of additional computerized image processing tools.

We are working on fiber-based registration as an alternative to the existing volumetric registration techniques. The effect of MS on NAWM in pre-defined fiber tracts is being investigated. In ongoing work, a fast and reliable classification of WM DTI fibers into various pre-learned anatomical tracts is being conducted, thus avoiding the need for tedious and time consuming extraction of each fiber tract from each of the test brains. This work is jointly with researchers in Tel Aviv University, Israel.

In a Link to Content in pdf formatrecent TMI paper a robust approach to the registration of white matter tractographies extracted from DT-MRI scans is presented. The fibers are projected into a high dimensional feature space based on the sequence of their 3D coordinates. Adaptive mean-shift (AMS) clustering is applied to extract a compact set of representative fiber-modes (FM). Each FM is assigned to a multivariate Gaussian distribution according to its population thereby leading to a Mixture of Gaussians (MoG) representation for the entire set of fibers. The registration between two fiber sets is treated as the alignment of two MoGs and is performed by maximizing their correlation ratio. A 9 parameters affine transform is recovered and eventually refined to a 12 parameters affine transform using an innovative mean-shift (MS) based registration refinement scheme presented in this paper. The validation of the algorithm on intra-subject data demonstrates its robustness against two main tractography artifacts: interrupted and deviating fiber tracts. Moreover, robustness to outliers is demonstrated and examples of tract-of-interest (TOI) based registration are provided.

1. Clustering and Learning

The field of clustering and learning has become a busy one lately. Our work in this area tries to take a fresh perspective. Unlike most work in computer vision using techniques from machine learning, we try to do the opposite, i.e., bring pattern recognition techniques to machine learning. Examples of this work are :

Order-preserving clustering

For ordered data sets, such as time series, age-varying data, etc., projecting them as points in a multi-dimensional space may cause loss of order during clustering. Using a modeling approach such as HMM or auto-regressive models for the data and clustering in parameter space usually loses the sensitivity in clustering. In this work, we tried to model ordered data sets as spatial curves and used a key idea of transforming distance between points to estimating the bends in the curves. See our Link to Content in pdf format CSB2003 paper for an application of this clustering method to gene expression time series.

Structural clustering

The most popular way to cluster is based on a distance constraint. However, distance as a criteria alone can sometimes be misleading for clustering and classification. Figure below shows a cluster of point and two new points A and B. Although both A and B are equidistant from the centroid as well as some support extremal points, A is more likely to belong to the cluster (and hence the class) than B.

Thus other measures besides distance criteria may have to be considered for clustering. In our work, we developed a new hierarchical unsupervised clustering algorithm which used the principle of perceptual grouping based on distance, orientation, density and spatial overlap of projected clouds to group sparse pointsets in multidimensional spaces. The resulting clusters look very different from the ones obtained by traditional clustering methods and in some sense capture the inherent shape that can be perceived for the point distributions. See our Link to Content in pdf format CVPR2007 paper for a description of this clustering and its comparison to other methods.

Odd-man-out principle for clustering and categorization

We are now working on a new algorithm for clustering and categorization using the odd-man-out principle used in IQ tests. A paper on this will soon be available.

3. Location Hashing

Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in image databases. Localization of such queries under changes in appearance, occlusions and background clutter, is a difficult problem. This work presented a new method of indexing image databases called location hashing that uses a special data structure called the location hash tree (LHT) for organizing feature information from images of a database. Location hashing is based on the principle of geometric hashing and determines simultaneously, the relevant images in the database and the regions within them that are most likely to contain a 2d pattern query without incurring detailed search of either. The location hash tree being a red-black tree, allows for efficient search for candidate locations using pose-invariant feature information derived from the query. See our Link to Content in pdf format CAIVD'99 paper from CVPR 1999 for more details. Joint work with Prabhakar Raghavan (now at Yahoo) and Nimrod Megiddo.

4. Region hashing

As image databases grow large, index structures for fast navigation become important. In particular, when the goal is to locate object queries in image databases under changes in pose, occlusions and spurious data, traditional index structures used in databases become unsuitable. We developed an index structure called the interval hash tree, for locating multi-region object queries in image databases. The utility of the index structure was demonstrated for query localization in a large image database. See our Link to Content in pdf format CAIVD'99 paper for more details. Region hashing was a technique for indexing images in the database that consists of a collection of regions. In our work, we demonstrated with color regions. The region layout was represented in affine-invariant coordinate system using affine intervals to bound-box the regions. An interval hash tree was created as the indexing structure across images using the affine intervals per image. This allowed for efficient localization of multi-region objects in images. Early work on using region hashing for slide localization in video was described in our Link to Content in pdf format CVPR 2000 Paper.

5. CueVideo

From 1997-2000, we worked on a series of techniques for representing and querying audio-video content in the CueVideo project. We had ways of segmenting and recognizing audio track, converting speech to text and offering text-based retrieval of transcribed audio and identifying topics in videos. We had ways of summarizing the video content through various means, including a way of modeling user interest using HMMs to automatically generate video previews. See our Link to Content in pdf format ACMM2001 paper among others.


The work in QBIC (Query by Image Content) was one of the early work in content-based retrieval. It established the visual querying paradigm as distinctly different from SQL or other structured query languages for databases. We introduced the word 'image indexing' in QBIC to refer to the retrieval of images containing object queries extracted from other images. This was in contrast to object indexing which was popular in computer vision at that time to index datatbases consisting of pre-segmented objects. In image indexing, both the images containing the object as well as the regions containing the object within selected images had to be identified. See my early papers from Link to Content in pdf formatSPIE 1995 .

1. Action recognition

The work on formal modeling of actions and events seriously began with ICCV 2001 in which we conducted the first Event 2001 workshop. Now recognizing actions and activities has become a mainstream topic in computer vision and multimedia.

Action Cylinders

In this work, we showed that actions could be recognized as objects. In particular, we showed that the successive projected images of a 3D object undergoing motion in space could be modeled as a generalized cylinder called the action cylinder. Reliable recognition is achieved by recovering the viewpoint transformation between the reference (model) and given action cylinders. A set of 8 corresponding points from time-wise corresponding cross-sections is shown to be sufficient to align the two cylinders under perspective projection. A surprising conclusion from visualizing actions as objects is that rigid, articulated, and nonrigid actions can all be modeled in a uniform framework See our Link to Content in pdf format ICCV-Event2001 workshop paper for details. Modeling actions as spatio-temporal shapes has now become well-accepted with several follow-on work talking about similar concepts such as action sketch, space-time volumes, etc. This was original joint work with Alex Vasilescu from MIT and Saratendu Sethi from Boston University.

Action Events

We developed approaches to recognize various types of action events, and also retrieved videos with action events. Much of our work from 2001-2004 covers this aspect. See our Link to Content in pdf formatICCV03 and Link to Content in pdf format ACM Multimedia'03 papers. Joint work with Mubarak Shah's group at UCF.

Action segmentation

In this work, we interpreted the scale-space image formed from the average velocity curve to segment activities. The average velocity curve is the axis of the action cylinder. See our Link to Content in pdf format ICPR06 paper.

Application-specific action patterns

In this work, we applied action cylinders to look for characteristic motion patterns in cardiac echo videos. Both chamber and valvular motion was captured. The average velocity curve (axis of the cylinder) was used to form discriminatory measures for diseases. See our Link to Content in pdf format Cinc2006 paper. Joint work with Jing Yang from Yale University.

2. Object/shape recognition

Much of our work on object recognition has dealt with the selection problem, namely, locating where an interesting object may be in a scene. We have also looked at nonrigid shape models for handling flexible objects or to capture the shape variations in an object category.

Attentional Selection

My thesis addressed attentional selection in object recognition. Here I developed computational models of visual attention catering to two modes, namely, 'attract attention' and 'pay attention'. We implemented these models using color, texture and line grouping to result in fast selection of object-containing regions in images. Of these, Link to Content in pdf formatattentional selection using color has most often been cited. My Ph.D thesis can be found here as an MIT AI Lab memo 1420.

Constrained affine shape models

In this work, we tried to model the shape variations among objects in a category (eg. faces) as a collection of constrained affine transforms that obey the spatial layout constraints with respect to a line of symmetry. This shape model also served as the category prototype to index large databases. An application of this model to document retrieval is described in our Link to Content in pdf format CAIVD96 from CVPR'96 paper. Joint work with Wei Zhu, University of Rochester.

Affine kernels

Affine kernels are alternative to pyramid kernels. More on this work will be reported shortly.

3. Color and Texture

Early work in color image segmentation and color-region-based querying was done during my thesis research. It also included work to measure saliency of regions based on color and texture, a topic that is now becoming popular again.

Color Saliency

In this work, we exploited the perceptual categorization of colors. While humans can distinguish between thousands of nuances of colors, their ability to remember color is only approximate, and divided into a handful of perceptual categories. Using an analogy to color palettes used in painting, we designed a set of perceptual categories that allowed images to be described in terms of illuminant-invariant color descriptors. These descriptions were used to segment images based on color as well as to identify salient color regions. See our Link to Content in pdf format IJCV'97 paper for more details.

Surface Color Classes

In this work, we gave a formal way of relating apparent color from different members of the same surface color (eg. skin color). This allowed stable segmentation and recognition of colored regions. See our Link to Content in pdf format ICPR'96 paper for more details. Was joint work with Yong-Qing cheng from the University of Rochester.

Salient texture

What makes a texture region salient? This paper was the first one to address the issue of texture saliency using a combination of signal processing and perceptual organization. Was part of thesis work and is described in CVIU journal paper from 1999.

Texture recognition

In this paper we explore the use of texture or pattern information on a 3D object as a cue to isolate regions in an image that are likely to come from the object. We develop a representation of texture based on the linear prediction (LP) spectrum that allows the recognition of the model texture under changes in orientation and occlusions. See our Link to Content in pdf format BMVC'93 for details.

Retrieving color regions

This includes our early work in recognizing objects by their color region layout using region adjacency graphs (Link to Content in pdf format ECCV'92). Other work in this area is color region hashing which has been described above.

[an error occurred while processing this directive]