Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Computing
Conference & Expo
November 2-4, 2009 NYC
Register Today and SAVE !..

2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
In many cases, the end of the year gives you time to step back and take stock of the last 12 months. This is when many of us take a hard look at what worked and what did not, complete performance reviews, and formulate plans for the coming year. For me, it is all of those things plus a time when I u...
SYS-CON.TV
Look Ma Bell, No Hands! - VoiceXML, X+V, and the Mobile Device
Look Ma Bell, No Hands! - VoiceXML, X+V, and the Mobile Device

The emerging world without wires has fostered a growing number of small and mobile devices (everything from PDAs to smart phones) capable of accessing data and running applications. The trouble is, while devices are getting smaller, human hands and fingers are not.

To assist users in managing their devices, user interface designers have begun to combine the traditional keyboard-input model with such interactive technologies as voice-directed input. This type of interaction, in which the user has more than one means of accessing data in his or her device, is sometimes called multimodal interaction. It is fast becoming the norm in the world of wireless mobile computing.

If asked, most developers will cite speed and efficiency as the main reasons for developing multimodal interfaces. Parallel input - for example, the ability to both key in commands and voice them - allows users to more quickly access and respond to information delivered by their devices. In fact, multimodal systems don't just enable faster interactions, they also add value to the overall experience of interaction. Multimodal interfaces allow more room for user preference (giving users a choice of how they interact with the system) and reduce the overexertion that can result from single-modality interaction. Being able to switch between modes of interaction can lead to a lower incidence of error (because users can choose the mode most suited to different activities), as well as easier error recovery. And, finally, multimodal interfaces have the capacity to accommodate a wider range of tasks and environments.

Speech adds tremendous value to small mobile devices, but in tandem, mobility and wireless connectivity are also moving computing into new physical environments. Wireless networks now provide connectivity anywhere and anytime. Connecting mobile devices to the network links mobile computing to back-end data anywhere and anytime. If the need for multimodal interaction extends to the network, then the Internet needs new technologies and standards to enable that functionality. Increasingly, Web developers are seeking ways to turn existing visually oriented Web pages into multimodal ones. And that's where X+V comes in.

The XHTML+Voice profile brings spoken interaction to standard Web content by integrating the mature XHTML and XML-Events technologies with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific Document Object Model (DOM) events, thereby reusing the event model familiar to Web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content.

X+V promises to deliver the feature set, flexibility, and ease of use that developers need to write one application that supports visual-only, voice-only, and multimodal interaction. The versatility of the Web and XML is reflected in the fact that X+V nicely integrates VoiceXML into the Web by marrying it with XHTML. X+V brings voice markup to the presentation layer, allowing you to speech-enable each component of the application interface.

Combining Voice and Visual Markup
Visual markup tells a Web browser what you want the user interface to look like and how you want it to behave when the user types, points, or clicks. Similarly, voice markup tells the Web browser what you want it to do when the user speaks to it. For visual markup, the browser uses a graphics engine; for voice markup, the browser uses a speech engine.

While both X+V and SALT use W3C standards for grammar and speech synthesis, only X+V is based entirely on standardized languages. X+V's modular architecture makes it very simple to separate an X+V application into different components. As a result, X+V applications can be coded in parts, with experts in voice programming developing voice elements and experts in visual programming developing visual ones. X+V's modularity also makes it adaptable to stand-alone voice application development. VoiceXML used in an X+V application can be reused inside a stand-alone VoiceXML application. SALT's reliance on the containing environment makes it very difficult to separate out its coding functions, and also makes the language insufficient to the task of stand-alone application development.

Richness is another factor that differentiates the two languages. Whereas SALT defines three tags - Prompt, Listen, and Bind - as its tag set for speech, X+V is based on the mature and tested VoiceXML standard. Because it uses VoiceXML's Form construct for its speech tag set, X+V includes all the utility of "prompt, listen, and bind," and more.

Just as visual markup specifies the visual interface items, voice markup specifies the voice interface items. Speech-enabling an application interface is a matter of first breaking the visual interface into its basic components (for example, an input field for a time of day and a checkbox for "a.m." or "p.m."), creating snippets of voice markup for each component, and then associating the snippets to the existing visual markup for each component. Consider the following questions:

  • What words should the speech engine speak or synthesize?
  • What words and phrases should the speech engine listen for?
  • What should the browser to do if the speech engine doesn't recognize a word or phrase?
  • What will be the result of the speech engine recognizing a word or phrase that has been spoken?
Correlating Voice and Visual Input/Output
Given an application's visual markup plus a collection of voice markup snippets, you have almost everything you need to create the presentation layer of a multimodal Web application. In fact, the only thing you still need is a way to tell the browser which snippets of voice markup go with which visual elements, and (because a speech engine can only have one snippet active at a time) when to activate each snippet of voice markup.

Given that the Web application environment is event-driven, X+V incorporates the DOM eventing framework used in the XML-Events standard. Using this framework, X+V defines the familiar event types from HTML such as "on mouse-over" or "on input focus" to create the correlation between visual and voice markup. Using XML-Events provides X+V with a uniform and standards-based eventing model that enables event integration between XML languages.

Separate Files and Reuse
Because all the parts of X+V are XML-compliant, the voice markup can be packaged in two ways: in the same file as the XHTML or in separate files. Separating voice markup from visual markup gives you more flexibility in developing your applications. For example, you can develop the voice markup separately from the visual markup and combine the two later.

Another advantage of keeping the files separate is reuse, such as the ability to reuse snippets of VoiceXML in numerous XHTML pages. In the example of a flight-reservation application, when users make the reservation they will be asked if they want a one-way, round-trip, or multi-leg reservation. For each answer, the system will call up a different form. While the three forms differ with regard to the type of trip desired, each one has the same departure city. If you have separated the voice snippet for the departure city you can reuse it in each of the three different XHTML forms, or containers.

The final advantage of keeping the VoiceXML separate from the XHTML is that it allows the snippets of VoiceXML to be reused in containers other than XHTML. In this case, X+V can utilize the VoiceXML notion of documents and forms, wherein a VoiceXML document contains one or more forms. You already know that VoiceXML forms can be linked to XHTML to create multimodal applications. But such forms can also be stitched together in a VoiceXML document (or container) to create voice-only applications. The end result is that you can (by reuse) create a single application that simultaneously supports multimodal browsers, GUI-only browsers, and voice-only systems such as IVRs.

Conclusion
X+V is the latest addition to the XML family of technologies for user interface development. Whereas XHTML is for developing visual interfaces, and VoiceXML focuses entirely on voice-based development, X+V is a hybrid, dedicated to developing multimodal application interfaces. X+V is particularly well suited to wireless development, where developers are faced with small visual interfaces and increasing user demand for voice input and output.

X+V's foundation in existing XML standards lends it tremendous strength and versatility. Interfaces developed using X+V are portable to a wide range of applications and development environments, can be easily developed in teams, and are highly scalable over time. Developers working with X+V can access the numerous resources that come with a well-developed standard such as XML. X+V also takes developers out of the loop of learning a new development language such as SALT, or adapting to the constraints of a more visually oriented development environment. Perhaps best of all, X+V does not require training invoice user interfaces or linguistics to operate; a basic knowledge of XML and related standards is sufficient to get started.

About Les Wilson
Les Wilson is an IBM senior technical staff member. He has been responsible for a variety of research and development projects related to man-machine interfaces, graphics, network computing, and user-interface technology. Les is currently the multimodal architect for IBM's Pervasive Computing Division.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Short answer yes. Long Answer:
1) X+V uses a standardized (W3C term is "Recommended") language for the voice markup whereas SALT does not.
2) X+V specifies XHTML as the "containing" GUI language.
3) X+V uses XMLEvents (another W3C "Rec") as the syntax for the application developer to specify the events that activate voice handlers. SALT leaves this up to the language into which it is being integrated.
That is, in addition to the Synthesis and Grammar formats, the "X", the "+", and the "V" of X+V are all W3C recommendations. One advantage of this characteristic of X+V is that it specifies a platform that in turn enables portability of the application. Additionally, specifying the linkage between visual and voice languages using XMLEvents enables standards based interperability between devices and servers for platform implementations that choose to distribute function across that boundary (e.g. distributing voice processing to a server but doing the GUI in the client).

I understand that both X+V and SALT use W3C standards for grammar and speech synthesis, but is X+V the *only* one of the two of them based entirely on standardized languages?

Interesting


Your Feedback
Les Wilson wrote: Short answer yes. Long Answer: 1) X+V uses a standardized (W3C term is "Recommended") language for the voice markup whereas SALT does not. 2) X+V specifies XHTML as the "containing" GUI language. 3) X+V uses XMLEvents (another W3C "Rec") as the syntax for the application developer to specify the events that activate voice handlers. SALT leaves this up to the language into which it is being integrated. That is, in addition to the Synthesis and Grammar formats, the "X", the "+", and the "V" of X+V are all W3C recommendations. One advantage of this characteristic of X+V is that it specifies a platform that in turn enables portability of the application. Additionally, specifying the linkage between visual and voice languages using XMLEvents enables standards based interperability between devices and servers for platform implementations that choose to distribute functio...
quEzztion wrote: I understand that both X+V and SALT use W3C standards for grammar and speech synthesis, but is X+V the *only* one of the two of them based entirely on standardized languages?
Les Wilson wrote: Interesting
SOA World Latest Stories
Yahoo’s critical negotiations with Alibaba to sell part of its stake in Alibaba back to the Chinese company have collapsed according to All Things Digital, a report later confirmed by CNBC. Apparently the collapse includes Yahoo’s parallel and intertwined negotiations with Softbank t...
Can you bring services from the cloud to your customers faster and have them adopt it with ease of use or bring the power of bundled services to the fingertips of your clients without creating new rigid ‘apps stove pipes'? Do you want to prevent your business running away to public and...
The Internet highway may start looking like a proverbial New York traffic jam at rush hour soon. Feel free to substitute any town you like because Cisco says there’s going to be a faster-than-expected 18x surge in worldwide mobile data traffic between 2011 and 2016. That’s when mob...
OCZ Technology Group, a provider of high-performance solid-state drives (SSDs) for computing devices and systems, on Tuesday announced the Z-Drive R4 CloudServ PCI Express (PCIe) flash storage solution, designed to accelerate cloud computing applications and reduce operating expenses i...
Many organizations have embraced, or are considering, the benefits of cloud computing – speed, flexibility, increased expertise, shared workload, reduced costs, etc. The benefits are many – but so are the risks. What are the threats to cloud security? Which parties assume responsibilit...
SoftLayer Technologies on Tuesday announced the immediate worldwide availability of SoftLayer Object Storage, a redundant and highly scalable cloud storage service that allows users to easily store, search and retrieve data across the Internet, with optional CDN connectivity, or across...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE