|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
XML Look Ma Bell, No Hands! - VoiceXML, X+V, and the Mobile Device
Look Ma Bell, No Hands! - VoiceXML, X+V, and the Mobile Device
By: Les Wilson
Aug. 3, 2004 12:00 AM
The emerging world without wires has fostered a growing number of small and mobile devices (everything from PDAs to smart phones) capable of accessing data and running applications. The trouble is, while devices are getting smaller, human hands and fingers are not. To assist users in managing their devices, user interface designers have begun to combine the traditional keyboard-input model with such interactive technologies as voice-directed input. This type of interaction, in which the user has more than one means of accessing data in his or her device, is sometimes called multimodal interaction. It is fast becoming the norm in the world of wireless mobile computing. If asked, most developers will cite speed and efficiency as the main reasons for developing multimodal interfaces. Parallel input - for example, the ability to both key in commands and voice them - allows users to more quickly access and respond to information delivered by their devices. In fact, multimodal systems don't just enable faster interactions, they also add value to the overall experience of interaction. Multimodal interfaces allow more room for user preference (giving users a choice of how they interact with the system) and reduce the overexertion that can result from single-modality interaction. Being able to switch between modes of interaction can lead to a lower incidence of error (because users can choose the mode most suited to different activities), as well as easier error recovery. And, finally, multimodal interfaces have the capacity to accommodate a wider range of tasks and environments. Speech adds tremendous value to small mobile devices, but in tandem, mobility and wireless connectivity are also moving computing into new physical environments. Wireless networks now provide connectivity anywhere and anytime. Connecting mobile devices to the network links mobile computing to back-end data anywhere and anytime. If the need for multimodal interaction extends to the network, then the Internet needs new technologies and standards to enable that functionality. Increasingly, Web developers are seeking ways to turn existing visually oriented Web pages into multimodal ones. And that's where X+V comes in. The XHTML+Voice profile brings spoken interaction to standard Web content by integrating the mature XHTML and XML-Events technologies with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific Document Object Model (DOM) events, thereby reusing the event model familiar to Web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content. X+V promises to deliver the feature set, flexibility, and ease of use that developers need to write one application that supports visual-only, voice-only, and multimodal interaction. The versatility of the Web and XML is reflected in the fact that X+V nicely integrates VoiceXML into the Web by marrying it with XHTML. X+V brings voice markup to the presentation layer, allowing you to speech-enable each component of the application interface. Combining Voice and Visual Markup While both X+V and SALT use W3C standards for grammar and speech synthesis, only X+V is based entirely on standardized languages. X+V's modular architecture makes it very simple to separate an X+V application into different components. As a result, X+V applications can be coded in parts, with experts in voice programming developing voice elements and experts in visual programming developing visual ones. X+V's modularity also makes it adaptable to stand-alone voice application development. VoiceXML used in an X+V application can be reused inside a stand-alone VoiceXML application. SALT's reliance on the containing environment makes it very difficult to separate out its coding functions, and also makes the language insufficient to the task of stand-alone application development. Richness is another factor that differentiates the two languages. Whereas SALT defines three tags - Prompt, Listen, and Bind - as its tag set for speech, X+V is based on the mature and tested VoiceXML standard. Because it uses VoiceXML's Form construct for its speech tag set, X+V includes all the utility of "prompt, listen, and bind," and more. Just as visual markup specifies the visual interface items, voice markup specifies the voice interface items. Speech-enabling an application interface is a matter of first breaking the visual interface into its basic components (for example, an input field for a time of day and a checkbox for "a.m." or "p.m."), creating snippets of voice markup for each component, and then associating the snippets to the existing visual markup for each component. Consider the following questions:
Given an application's visual markup plus a collection of voice markup snippets, you have almost everything you need to create the presentation layer of a multimodal Web application. In fact, the only thing you still need is a way to tell the browser which snippets of voice markup go with which visual elements, and (because a speech engine can only have one snippet active at a time) when to activate each snippet of voice markup. Given that the Web application environment is event-driven, X+V incorporates the DOM eventing framework used in the XML-Events standard. Using this framework, X+V defines the familiar event types from HTML such as "on mouse-over" or "on input focus" to create the correlation between visual and voice markup. Using XML-Events provides X+V with a uniform and standards-based eventing model that enables event integration between XML languages. Separate Files and Reuse Another advantage of keeping the files separate is reuse, such as the ability to reuse snippets of VoiceXML in numerous XHTML pages. In the example of a flight-reservation application, when users make the reservation they will be asked if they want a one-way, round-trip, or multi-leg reservation. For each answer, the system will call up a different form. While the three forms differ with regard to the type of trip desired, each one has the same departure city. If you have separated the voice snippet for the departure city you can reuse it in each of the three different XHTML forms, or containers. The final advantage of keeping the VoiceXML separate from the XHTML is that it allows the snippets of VoiceXML to be reused in containers other than XHTML. In this case, X+V can utilize the VoiceXML notion of documents and forms, wherein a VoiceXML document contains one or more forms. You already know that VoiceXML forms can be linked to XHTML to create multimodal applications. But such forms can also be stitched together in a VoiceXML document (or container) to create voice-only applications. The end result is that you can (by reuse) create a single application that simultaneously supports multimodal browsers, GUI-only browsers, and voice-only systems such as IVRs. Conclusion X+V's foundation in existing XML standards lends it tremendous strength and versatility. Interfaces developed using X+V are portable to a wide range of applications and development environments, can be easily developed in teams, and are highly scalable over time. Developers working with X+V can access the numerous resources that come with a well-developed standard such as XML. X+V also takes developers out of the loop of learning a new development language such as SALT, or adapting to the constraints of a more visually oriented development environment. Perhaps best of all, X+V does not require training invoice user interfaces or linguistics to operate; a basic knowledge of XML and related standards is sufficient to get started. Reader Feedback: Page 1 of 1
Your Feedback
SOA World Latest Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||