Tanveer Syeda-Mahmood Almaden Research Center 
  Healthcare Informatics  
  Almaden Computer Science

Research Areas

My research has spanned over two decades in the field of content-based retrieval. In this field, I have focused on developing techniques that lead to practical applications of content-based retrieval while still extracting general methods for use in other computer vision applications. This includes flexible shape representation methods, indexing techniques for locating objects within images, modeling actions of 3D objects in space and time, and finally use in as diverse applications in document analysis, bioinformatics, web services, and most recently, in medical imaging.

Medical Imaging

The domain of cardiology is rich in multimodal imaging data. As part of the AALIM project, I am currently investigating multiple modalities for extracting disease-specific information. Some of this is joint work with members of my group in healthcare informatics at Almaden Research. The core research involves developing content-based retrieval techniques for medical imaging datasets.

EKG retrieval

An electrocardiogram (EKG) is an important and commonly used diagnostic aid in cardiovascular disease diagnosis.

Many disturbances in the heart function show as characteristic variations in the sinus rhythm waveform of Fig. 1b, and can be used as cues to diagnose the disease. Fig. 1c shows such a modification in the ECG due to premature ventricular contraction, where the heart skips a beat only to beat very strongly in the next, causing a missed R segment. Physicians routinely make diagnosis by a simple visual examination of these ECG waveform. It is common knowledge to physicians that patients with the same disease have similar-looking ECG shape in the relevant channels.

We capture this fundamental intuition used by physicians in a computational algorithm for finding shape similarity of ECG recordings. This can help retrieve patients that have similar disease as the current patient and aid a physician in diagnostic decision support.

In our work, we capture the disease-specific variations in the EKG shape using a constrained non-rigid translation transform. The transform is recovered using a variant of dynamic shape warping. The warping distance is also used to form the shape similarity metric for retrieval. Details are in our EMBC paper.


Echo Video Analysis

An echocardiogram depicts spatio-temporal motion patterns of various cardiac chambers and valves (see video below). It is a commonly used diagnostic imaging to identify many heart diseases including valvular abnormalities, hypertrophy, hypokinesia, etc. Disease-specific understanding of echocardiographic sequences requires accurate characterization of spatio-temporal motion patterns. We have been working onf methods for automatic extraction and matching of spatio-temporal patterns from cardiac echo videos. Our first approach used a simple idea of capturing holistic heart motion through average velocity curves and finding patterns that separate normal vs. diseased heart (see our ACM-MM05 paper).

Our next approach was by segmenting cardiac chambers using a motion-based normalized cut segmentation algorithm. We then registered motion patterns of correspoding regions using active shape models. volumetric trajectories and other measures were extracted from corresponding regions for separating normal vs. diseases heart motion. This method is described in our MICCAI07 paper.

Recently we have looked at direct disease recognition from cardiac chamber size and motion using active shape models for modeling the cardiac chambers. This is described in our EMBC08 paper.

See other work by me and my colleagues on recognizing viewpoints in echo videos


Heart sound retrieval

Heart auscultation, i.e., listening to the sounds produced by the heart, is a common practice in the screening of heart disease. Although different diseases produce characteristic sounds, forming a diagnosis based on sounds heard through a stethoscope is a skill that takes years to perfect. Auditory discrimination of heart sounds is inherently difficult as these sounds faint and lie at the lower end of the audible frequency range. For this work, our main intuition was that if we could form visual representations of heart sounds, they can actually be more discriminatory than auditory renderings. Figure below shows examples of heart sounds that can be distinguished based on their visual appearance.

We analyze heart sound signals by extracting envelop curves and matching them using essentially the same transform that was used for aligning EKG signals. By matching the envelop shape signatures, we are able to retrieve heart sounds of matching diseases. Details of this work are in our ECCV08 paper.


Multimedia retrieval

Here I highlight some of my research in this field from the past 10 years. Not the most current work, but shows the variations in techniques we developed which somehow keep getting rediscovered in recent work.

Topic identification in videos

A long-standing goal of distance learning has been to provide a quality of learning comparable to the face-to-face environment of a traditional classroom for teaching or training. Effective ways of high-level semantic querying such as for the retrieval of relevant learning material relating to a topic of discussion can help go a long way in achieving this goal. We worked on identifying video segments relating to a topic of discussion by indexing videos using the image and text content of foils. The foil images were localized in video using the color and spatial layout geometry of their regions. We then search the audio associated with video based on the text content of the foil to identify related video segments in which concepts represented on a foil are heard. Finally, we combine the results of foil image and text search of video exploiting their time co-occurrence. See our ACM Multimedia'00 paper for more details. Joint work with Savitha Srinivasan.


Location hashing

Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in image databases. Localization of such queries under changes in appearance, occlusions and background clutter, is a difficult problem. This work presented a new method of indexing image databases called location hashing that uses a special data structure called the location hash tree (LHT) for organizing feature information from images of a database. Location hashing is based on the principle of geometric hashing and determines simultaneously, the relevant images in the database and the regions within them that are most likely to contain a 2d pattern query without incurring detailed search of either. The location hash tree being a red-black tree, allows for efficient search for candidate locations using pose-invariant feature information derived from the query. See our CAIVD'99 paper from CVPR 1999 for more details. Joint work with Prabhakar Raghavan (now at Yahoo) and Nimrod Megiddo.


Region hashing

As image databases grow large, index structures for fast navigation become important. In particular, when the goal is to locate object queries in image databases under changes in pose, occlusions and spurious data, traditional index structures used in databases become unsuitable. We developed an index structure called the interval hash tree, for locating multi-region object queries in image databases. The utility of the index structure was demonstrated for query localization in a large image database. See our CAIVD'99 paper for more details. Region hashing was a technique for indexing images in the database that consists of a collection of regions. In our work, we demonstrated with color regions. The region layout was represented in affine-invariant coordinate system using affine intervals to bound-box the regions. An interval hash tree was created as the indexing structure across images using the affine intervals per image. This allowed for efficient localization of multi-region objects in images. Early work on using region hashing for slide localization in video was described in our CVPR 2000 Paper.



From 1997-2000, we worked on a series of techniques for representing and querying audio-video content in the CueVideo project. We had ways of segmenting and recognizing audio track, converting speech to text and offering text-based retrieval of transcribed audio and identifying topics in videos. We had ways of summarizing the video content through various means, including a way of modeling user interest using HMMs to automatically generate video previews. See our ACMM2001 paper among others.



The work in QBIC (Query by Image Content) was one of the early work in content-based retrieval. It established the visual querying paradigm as distinctly different from SQL or other structured query languages for databases. I introduced the word 'image indexing' in QBIC to refer to the retrieval of images containing object queries extracted from other images. This was in contrast to object indexing which was popular in computer vision at that time to index datatbases consisting of pre-segmented objects. In image indexing, both the images containing the object as well as the regions containing the object within selected images had to be identified. See my early papers from SPIE 1995 .


Clustering and Learning

The field of clustering and learning has become a busy one lately. Our work in this area tries to take a fresh perspective. Unlike most work in computer vision using techniques from machine learning, we try to do the opposite, i.e., bring pattern recognition techniques to machine learning. Examples of this work are :

Order-preserving clustering

For ordered data sets, such as time series, age-varying data, etc., projecting them as points in a multi-dimensional space may cause loss of order during clustering. Using a modeling approach such as HMM or auto-regressive models for the data and clustering in parameter space usually loses the sensitivity in clustering. In this work, we tried to model ordered data sets as spatial curves and used a key idea of transforming distance between points to estimating the bends in the curves. See our CSB2003 paper for an application of this clustering method to gene expression time series.


Structural clustering

The most popular way to cluster is based on a distance constraint. However, distance as a criteria alone can sometimes be misleading for clustering and classification. Figure below shows a cluster of point and two new points A and B. Although both A and B are equidistant from the centroid as well as some support extremal points, A is more likely to belong to the cluster (and hence the class) than B.

Thus other measures besides distance criteria may have to be considered for clustering. In our work, we developed a new hierarchical unsupervised clustering algorithm which used the principle of perceptual grouping based on distance, orientation, density and spatial overlap of projected clouds to group sparse pointsets in multidimensional spaces. The resulting clusters look very different from the ones obtained by traditional clustering methods and in some sense capture the inherent shape that can be perceived for the point distributions. See our CVPR2007 paper for a description of this clustering and its comparison to other methods.


Odd-man-out principle for clustering and categorization

We are now working on a new algorithm for clustering and categorization using the odd-man-out principle used in IQ tests. A paper on this will soon be available.


Action recognition

The work on formal modeling of actions and events seriously began with ICCV 2001 in which we conducted the first Event 2001 workshop. Now recognizing actions and activities has become a mainstream topic in computer vision and multimedia.

Action Cylinders

In this work, we showed that actions could be recognized as objects. In particular, we showed that the successive projected images of a 3D object undergoing motion in space could be modeled as a generalized cylinder called the action cylinder. Reliable recognition is achieved by recovering the viewpoint transformation between the reference (model) and given action cylinders. A set of 8 corresponding points from time-wise corresponding cross-sections is shown to be sufficient to align the two cylinders under perspective projection. A surprising conclusion from visualizing actions as objects is that rigid, articulated, and nonrigid actions can all be modeled in a uniform framework See our ICCV-Event2001 workshop paper for details. Modeling actions as spatio-temporal shapes has now become well-accepted with several follow-on work talking about similar concepts such as action sketch, space-time volumes, etc. This was original joint work with Alex Vasilescu from MIT and Saratendu Sethi from Boston University.


Action Events

We developed approaches to recognize various types of action events, and also retrieved videos with action events. Much of our work from 2001-2004 covers this aspect. See our ICCV03 and ACM Multimedia'03 papers. Joint work with Mubarak Shah's group at UCF.


Action segementation

In this work, we interpreted the scale-space image formed from the average velocity curve to segment activities. The average velocity curve is the axis of the action cylinder. See our ICPR06 paper.


Application-specific action patterns

In this work, we applied action cylinders to look for characteristic motion patterns in cardiac echo videos. Both chamber and valvular motion was captured. The average velocity curve (axis of the cylinder) was used to form discriminatory measures for diseases. See our Cinc2006 paper. Joint work with Jing Yang from Yale University.


Object/shape recognition

Much of my work on object recognition has dealt with the selection problem, namely, locating where an interesting object may be in a scene. I have also looked at nonrigid shape models for handling flexible objects or to capture the shape variations in an object category.

Attentional Selection

My thesis addressed attentional selection in object recognition. Here I developed computational models of visual attention catering to two modes, namely, 'attract attention' and 'pay attention'. We implemented these models using color, texture and line grouping to result in fast selection of object-containing regions in images. Of these, attentional selection using color has most often been cited. My Ph.D thesis can be found here as an MIT AI Lab memo 1420.


Constrained affine shape models

In this work, we tried to model the shape variations among objects in a category (eg. faces) as a collection of constrained affine transforms that obey the spatial layout constraints with respect to a line of symmetry. This shape model also served as the category prototype to index large databases. An application of this model to document retrieval is described in our CAIVD96 from CVPR'96 paper. Joint work with Wei Zhu, University of Rochester.


Affine kernels

Affine kernels are alternative to pyramid kernels. More on this work will be reported shortly.


Document Analysis

Prior to our entry in document analysis, much of the work was focused on OCR. We introduced shape models and shape recognition techniques into document layout analysis, form recognition as well as for handwriting recognition. Our work in this area is now expanding to multimodal documents including text recognition in cardiac echo videos.

Text recognition in cardiac echo videos

Many important diagnostic measurements are captured by echocardiographers in screenshots during an echo study. Not all these measurements are captured in a DICOM description of the echo study. In this work, we automatically recognize the measurement-value pairs from echo video frames and use it for disease-specific retrieval of cardiac echo videos. This is described in our ICDAR'09 paper. Joint work with David Beymer and Arnon Amir from my group.


ECG Document interpretation

In this work, we interpret ECG documents to extract periodicity. We show that the image-rendered form of ECG as availabe in ECG scanned paper documents or as ECG traces in echocardiogram video frames are actually easier to extract periodicities from than their digital forms. The idea exploited is a pixel dithering trick as explained in our ICDAR09 paper. Joint work with Fei Wang and David Beymer with the work led by Fei.


FormPad: A digital notepad

Paper-based forms are ubiquitous in hospital environments. With high volume of forms being scanned, and the difficulty of handwriting recognition from filled form entries, most electronic record systems simply store the form images with the field label information entered manually. FormPad was an invention in which we designed a camera assisted writing tablet. It preserves the familiar experience of filling out a paper form while allowing automatic conversion of relevant handwritten field entries into electronic form, without explicit form scanning. To enable this, we had to recognize the form that was on the tablet as well as accurate projection of the fields into the electronic form. Form indexing is modeled as the problem of shape-based content retrieval using the perspectively- distorted form appearances seen from the tablet camera. Fast form indexing is achieved using geometric hashing based on projective invariants. We derived field projection as a sequence of projective transformations between the tablet, the camera and the original electronic form coordinates. Our ACCV06 paper provides further details. Joint work with Tom Zimmerman.


Handwriting indexing

An important problem in the management of scanned handwritten document image collections, is their retrieval based on word queries. In this work, we handled the variations in handwriting of a single author using constrained non-rigid affine transform. Handwritten documents were pre-processed and analyzed to extract word-containing regions. Different instances of the same word within a document were located using a variant of location hashing. See our paper in DIA'96 at CVPR'96 for more details.


Technical line drawing indexing

Image indexing, namely, the problem of retrieving content information from images in response to queries, is a key problem underlying the operations in image databases. In this paper we presented a method of indexing for 3D object queries in a database of a class of images called technical line drawings. Indexing is achieved as a combination of query-specific region selection and object recognition. The selection phase isolates relevant images and the regions in these images that are likely to contain the queried object. This is done using text information in the query and a grouping mechanism that is guaranteed to isolate single-object containing regions for the class of technical line drawing images. The grouping mechanism is an adaptation of Waltz relaxation to an extended junction set derived by analyzing the physically plausible ways in which interpretation lines interact with object contours. Model-based object recognition then confirms the presence of the part at the selected location using geometrical description of the queried 3D object. See our Trans. PAMI'99 paper for more details.


Color and Texture

Early work in color image segmentation and color-region-based querying was done during my thesis research. It also included work to measure saliency of regions based on color and texture, a topic that is now becoming popular again.

Color Saliency

In this work, we exploited the perceptual categorization of colors. While humans can distinguish between thousands of nuances of colors, their ability to remember color is only approximate, and divided into a handful of perceptual categories. Using an analogy to color palettes used in painting, we designed a set of perceptual categories that allowed images to be described in terms of illuminant-invariant color descriptors. These descriptions were used to segment images based on color as well as to identify salient color regions. See our IJCV'97 paper for more details.


Surface Color Classes

In this work, we gave a formal way of relating apparent color from different members of the same surface color (eg. skin color). This allowed stable segmentation and recognition of colored regions. See our ICPR'96 paper for more details. Was joint work with Yong-Qing cheng from the University of Rochester.


Salient texture

What makes a texture region salient? This paper was the first one to address the issue of texture saliency using a combination of signal processing and perceptual organization. Was part of thesis work and is described in CVIU journal paper from 1999.


Texture recognition

In this paper we explore the use of texture or pattern information on a 3D object as a cue to isolate regions in an image that are likely to come from the object. We develop a representation of texture based on the linear prediction (LP) spectrum that allows the recognition of the model texture under changes in orientation and occlusions. See our BMVC'93 for details.


Retrieving color regions

This includes our early work in recognizing objects by their color region layout using region adjacency graphs ( ECCV'92). Other work in this area is color region hashing which has been described above.



Our work in this area is predominantly guided by the development and application of novel pattern recognition techiques for the field of Bioinformatics.

Prediction of cell-cyle regulation

The key idea put forward here was that by monitoring time-varying gene expression through the cell cycle as a spatial curve, the salient bends in the curve could be indcative of the regulatory phase within a cell cycle for a particular gene. The salient bends in the curve were analysed using a scale-space curvature detection method. Details are described in our ICASSP'03 paper.


AI and Semantic Web

Knowledge representation has been a latent interest of mine. Early work in this area was on proving decidabilities of inference relations. More recent work was in the application of content-based retrieval techniques to search and match structured documents (eg. XML schemas).

Automatic theorem provers

Theorem provers have always been of interest to the AI community. Previous methods for proving theorems were model-theoretic mechanisms. In this work, we introduced a syntactic proving method that could be easily mechanized. This led to further work on taxonomic syntax for natural language later. Our Knowledge Representation and Reasoning conference paper described this work. Joint work with Dave McAllester and Bob Givan, MIT.


Semantic API Matching

This work influenced code integration community as it showed a way of automatically finding connectivity between code components to allow ease-of-use programming. Part of the MineLink system built. Our Our WWW'03 paper describes it for web service composition. The key idea there is the use of a cost-scaling algorithm for bipartite graph matching applied to the APIs of codes. Joint work with Doina Caragea, University of Iowa.


Semantic search of schemas

This work presented a content-based retrieval system for XML, and web service schemas. It allowed a query-by-example paradigm where a sample schema could be used to query a database of schemas. The information used for matching included names of fields, their type and structural information captured in schemas. Our WWW'04 paper describes it for web service schemas. Joint work with folks from the IBM SVL Lab.


Semantic and ontological matching

This work showed the use of service description ontologies using OWL as an additional data source for matching services. Our ICWS'05 paper describes it for web services. Joint work with Rama Akkiraju and Richard Goodwin at IBM TJ Watson Research Center.


Signal Processing

I started my research career in signal processing, addressing the signal reconstruction problem from partial data. We have several approaches to address this problem using STFT magnitude and phase, group delay function, maximum entropy approaches, etc. Our work of this era is described in our papers from Trans. ASSP'89 or ASSP'87 papers for examples.

  Privacy | Terms of Use | IBM Home | Almaden Home | Research Sites | Page Contact