Conversational And Multimodal User Interaction

David Nahamoo

IBM Research

T.V. Raman

IBM Research

Abstract

The coming of age of speech technologies means that it is now time to integrate speech interaction as a first-class citizen into human-computer user interfaces. This form of first-class spoken interaction requires more than speaking a visual presentation or having the user speak what would have been entered via the keyboard; leveraging the benefits of rich spoken interaction requires the design of conversational and multimodal user interfaces that exploit the unique advantages inherent in the available user interface modalities. This tutorial will cover aspects of conversational and multimodal user interaction including emerging Web standards that support such interaction, deployment frameworks that facilitate the creation and delivery of such user interaction, and the resulting challenges that emerge for researchers working on speech and natural language technologies.


Table of Contents

Standards Overview
Creating And Deploying Multimodal Interaction
Creating And Deploying Conversational Interaction
Emerging Challenges For Speech Research

Standards Overview

Open standards encourage interoperability. Below, we enumerate the relevant standards for authoring and deploying multimodal interaction.

W3C XHTML
W3C XForms
W3C XML Events
W3C Voice XML
XHTML+Voice

Creating And Deploying Multimodal Interaction

Multimodal interaction can be deployed locally on client devices or distributed across the network. Below, we enumerate some of the deployment scenarios:

GUI and voice processing on local device, e.g., PDAs
GUI with limited speech processing on local device, e.g., smart phones.
GUI on client with voice processing across the network, e.g., cell phones.

Each of these deployment scenarios requires various levels of synchronization across the available modalities.

Creating And Deploying Conversational Interaction

Conversational interaction is relevant both for voice-only interfaces as well as multimodal interaction. Conversational interaction uses techniques from Natural Language Processing (NLP) to create rich dialog models that enable rapid task completion. When combined with multimodal interaction, conversational interaction faces the added challenge of needing to integrate across a multiplicity of inputs in computing user intent.

Emerging Challenges For Speech Research

Here, we enumerate the upcoming challenges for speech research that emerge from the need to deploy speech interaction on a wide variety of emerging pervasive devices.