| Past Almaden Projects |
CueVideo is an ongoing research project to address the challenges that arise in the creation,
indexing and use of large video databases. The project was started in 1997 in the Visual
Media Management Group at IBM's Almaden Research Center.
|
With advances in hardware and network technology, the proliferation of ever smaller
and cheaper video capture devices, and the emergence of the web, more and more media
rich applications are becoming economically feasible. One of the key emerging areas
is distance or distributed learning. For applications in this area, the content
usually consists of media rich material, such as video, audio, text and foils.
Video content, when properly hyperlinked to text and foils, significantly enhances
learning and the communication experience. Video provides the realism, interest and
detail not available in other media and it is critical in many areas, such as medical,
maintenance, sales and training.
|
The two bottlenecks preventing video from becoming an integral part of distributed
learning are not the cost of basic hardware and software but:
a) the cost and time to index and hyperlink the video; and
b) enabling users to easily search and browse the video content.
|
CueVideo addresses these bottlenecks. It is rapidly moving to full automation
of the indexing and hyperlinking process. CueVideo combines video and audio
analysis, speech recognition (building on IBM’s ViaVoice(TM)), information
retrieval and artificial intelligence. It offers unique and novel functionality
not available elsewhere, like moving storyboards and smart fast video and audio
browsing. The CueVideo project works in collaboration with T. J. Watson (speech
and information retrieval) and Haifa (audio analysis).
|
CueVideo is packaged as a modular system having two basic components:
First, an off-line indexing engine that computes indices, hyperlinks and
data for browsing, saved on the CueVideo server; second an interactive
user interface that provides the user with CueVideo advanced tools for
searching and browsing videos. The CueVideo client runs on standard web
browsers using standard media plug-ins like RealNetwork(TM) or QuickTime(TM).
|
Key Innovations in Cue Video [back to top]
The key innovations in the CueVideo Toolkit are:
- Fully automated video indexer, including speech indexing, scene change detection, and generation of multiple browsable views.
- Advanced speech retrieval server. Finds time- stamped matches in the speech for any text queries.
- Non-linear, direct access to videos at query matchs points.
- Application interface over the internet, with embeded streaming video.
- Smart video browsing technology, including full video, audio-visual slide shows, fast audio with natural pitch - all streaming modes are fully synchronized, and instantaneously switchable.
- SDK includes both indexing and server API-s and sample applications.
- Supports multiple input video formats: MPEG, QuickTime, AVI, H263, etc.
- Scalable server architecture.
|
Papers [back to top]
G. Ashour, B. Dom, J. Golden, J. Pieper, and S. Srinivasan, "Who is SMILing on the Web?", in Poster Proceedings of WWW-10, May 2001.
A. Amir, G. Ashour and S. Srinivasan, "Towards Automatic Real Time Preparation of On-Line Video Proceedings for Conference Talks and Presentations", Thirty-Fourth Hawaii Int. Conf. on System Sciences, HICSS-34, Maui, January 2001.
S. Srinivasan, and D. Petkovic, "Phonetic Confusion Matrix Based Spoken Document Retrieval", in Proceedings of SIGIR-2000, Greece, July 2000.
W. Niblack, S. Yue, R. Kraft, A. Amir and N. Sundaresan, "Web-Based Searching and Browsing of Multimedia Data", IEEE Int. Conf. on Multimedia and Expo, New York, USA, July 2000.
S. Srinivasan, D. Petkovic, D. Ponceleon, and M. Viswanathan, "Query Expansion for Imperfect Speech: Applications in Distributed Learning", in CBAIVL-2000, IEEE Workshop on Content-based Access of Image and Video Libraries, Hilton Head Island, South Carolina, June 2000.
S. Srinivasan, D. Ponceleon, A. Amir, B. Blanchard, D. Petkovic, "Engineering the Web for Multimedia", in Web Engineering workshop (WEBE), WWW-9, Amsterdam, May 2000.
A. Amir, D. Ponceleon, B. Blanchard, D. Petkovic, S. Srinivasan, and G. Cohen, "Using Audio Time Scale Modification for Video Browsing", Best paper award Hawaii Int. Conf. on System Sciences, HICSS-33, Maui, January 2000.
S. Srinivasan, D. Ponceleon, A. Amir, and D. Petkovic, "What is in that video anyway? In Search of Better Browsing", Proceedings of 6th IEEE Int. Conf. on Multimedia Computing and Systems, pp. 388-392, Florence, Italy, June 1999.
|
Presentations [back to top]
|
For More information contact savitha@almaden.ibm.com
|
|
|