Comments
Yakov Fain wrote: @roche It looks like you didn't get my message and analogies. I know that Apple simply doesn't want Flash Player on iPhone regardless of how good/bad the product is. I also know that Adobe has good engineers, but I don't see that they have much support from the management. By support I mean providing enough resources for delivering software of superb quality. Your statement about "internal assessments of Adobe's management by its own engineers" is great, but show me the money. Why in the world does it take two years to release the next version of Flex? Inadequate funding. Why Adobe...
Cloud Computing
Conference & Expo
November 2-4, 2009 NYC
Register Today and SAVE !..


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts

Now more than every there is pressure on IT to offer higher levels of service and a greater degree of availability all while cutting back on costs. As such, making sure your technology environment is efficient and effectively managed is absolutely essential. The data center, by its very nature, i...

SYS-CON.TV
Look Ma Bell, No Hands! - VoiceXML, X+V, and the Mobile Device
Look Ma Bell, No Hands! - VoiceXML, X+V, and the Mobile Device

The emerging world without wires has fostered a growing number of small and mobile devices (everything from PDAs to smart phones) capable of accessing data and running applications. The trouble is, while devices are getting smaller, human hands and fingers are not.

To assist users in managing their devices, user interface designers have begun to combine the traditional keyboard-input model with such interactive technologies as voice-directed input. This type of interaction, in which the user has more than one means of accessing data in his or her device, is sometimes called multimodal interaction. It is fast becoming the norm in the world of wireless mobile computing.

If asked, most developers will cite speed and efficiency as the main reasons for developing multimodal interfaces. Parallel input - for example, the ability to both key in commands and voice them - allows users to more quickly access and respond to information delivered by their devices. In fact, multimodal systems don't just enable faster interactions, they also add value to the overall experience of interaction. Multimodal interfaces allow more room for user preference (giving users a choice of how they interact with the system) and reduce the overexertion that can result from single-modality interaction. Being able to switch between modes of interaction can lead to a lower incidence of error (because users can choose the mode most suited to different activities), as well as easier error recovery. And, finally, multimodal interfaces have the capacity to accommodate a wider range of tasks and environments.

Speech adds tremendous value to small mobile devices, but in tandem, mobility and wireless connectivity are also moving computing into new physical environments. Wireless networks now provide connectivity anywhere and anytime. Connecting mobile devices to the network links mobile computing to back-end data anywhere and anytime. If the need for multimodal interaction extends to the network, then the Internet needs new technologies and standards to enable that functionality. Increasingly, Web developers are seeking ways to turn existing visually oriented Web pages into multimodal ones. And that's where X+V comes in.

The XHTML+Voice profile brings spoken interaction to standard Web content by integrating the mature XHTML and XML-Events technologies with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific Document Object Model (DOM) events, thereby reusing the event model familiar to Web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content.

X+V promises to deliver the feature set, flexibility, and ease of use that developers need to write one application that supports visual-only, voice-only, and multimodal interaction. The versatility of the Web and XML is reflected in the fact that X+V nicely integrates VoiceXML into the Web by marrying it with XHTML. X+V brings voice markup to the presentation layer, allowing you to speech-enable each component of the application interface.

Combining Voice and Visual Markup
Visual markup tells a Web browser what you want the user interface to look like and how you want it to behave when the user types, points, or clicks. Similarly, voice markup tells the Web browser what you want it to do when the user speaks to it. For visual markup, the browser uses a graphics engine; for voice markup, the browser uses a speech engine.

While both X+V and SALT use W3C standards for grammar and speech synthesis, only X+V is based entirely on standardized languages. X+V's modular architecture makes it very simple to separate an X+V application into different components. As a result, X+V applications can be coded in parts, with experts in voice programming developing voice elements and experts in visual programming developing visual ones. X+V's modularity also makes it adaptable to stand-alone voice application development. VoiceXML used in an X+V application can be reused inside a stand-alone VoiceXML application. SALT's reliance on the containing environment makes it very difficult to separate out its coding functions, and also makes the language insufficient to the task of stand-alone application development.

Richness is another factor that differentiates the two languages. Whereas SALT defines three tags - Prompt, Listen, and Bind - as its tag set for speech, X+V is based on the mature and tested VoiceXML standard. Because it uses VoiceXML's Form construct for its speech tag set, X+V includes all the utility of "prompt, listen, and bind," and more.

Just as visual markup specifies the visual interface items, voice markup specifies the voice interface items. Speech-enabling an application interface is a matter of first breaking the visual interface into its basic components (for example, an input field for a time of day and a checkbox for "a.m." or "p.m."), creating snippets of voice markup for each component, and then associating the snippets to the existing visual markup for each component. Consider the following questions:

  • What words should the speech engine speak or synthesize?
  • What words and phrases should the speech engine listen for?
  • What should the browser to do if the speech engine doesn't recognize a word or phrase?
  • What will be the result of the speech engine recognizing a word or phrase that has been spoken?
Correlating Voice and Visual Input/Output
Given an application's visual markup plus a collection of voice markup snippets, you have almost everything you need to create the presentation layer of a multimodal Web application. In fact, the only thing you still need is a way to tell the browser which snippets of voice markup go with which visual elements, and (because a speech engine can only have one snippet active at a time) when to activate each snippet of voice markup.

Given that the Web application environment is event-driven, X+V incorporates the DOM eventing framework used in the XML-Events standard. Using this framework, X+V defines the familiar event types from HTML such as "on mouse-over" or "on input focus" to create the correlation between visual and voice markup. Using XML-Events provides X+V with a uniform and standards-based eventing model that enables event integration between XML languages.

Separate Files and Reuse
Because all the parts of X+V are XML-compliant, the voice markup can be packaged in two ways: in the same file as the XHTML or in separate files. Separating voice markup from visual markup gives you more flexibility in developing your applications. For example, you can develop the voice markup separately from the visual markup and combine the two later.

Another advantage of keeping the files separate is reuse, such as the ability to reuse snippets of VoiceXML in numerous XHTML pages. In the example of a flight-reservation application, when users make the reservation they will be asked if they want a one-way, round-trip, or multi-leg reservation. For each answer, the system will call up a different form. While the three forms differ with regard to the type of trip desired, each one has the same departure city. If you have separated the voice snippet for the departure city you can reuse it in each of the three different XHTML forms, or containers.

The final advantage of keeping the VoiceXML separate from the XHTML is that it allows the snippets of VoiceXML to be reused in containers other than XHTML. In this case, X+V can utilize the VoiceXML notion of documents and forms, wherein a VoiceXML document contains one or more forms. You already know that VoiceXML forms can be linked to XHTML to create multimodal applications. But such forms can also be stitched together in a VoiceXML document (or container) to create voice-only applications. The end result is that you can (by reuse) create a single application that simultaneously supports multimodal browsers, GUI-only browsers, and voice-only systems such as IVRs.

Conclusion
X+V is the latest addition to the XML family of technologies for user interface development. Whereas XHTML is for developing visual interfaces, and VoiceXML focuses entirely on voice-based development, X+V is a hybrid, dedicated to developing multimodal application interfaces. X+V is particularly well suited to wireless development, where developers are faced with small visual interfaces and increasing user demand for voice input and output.

X+V's foundation in existing XML standards lends it tremendous strength and versatility. Interfaces developed using X+V are portable to a wide range of applications and development environments, can be easily developed in teams, and are highly scalable over time. Developers working with X+V can access the numerous resources that come with a well-developed standard such as XML. X+V also takes developers out of the loop of learning a new development language such as SALT, or adapting to the constraints of a more visually oriented development environment. Perhaps best of all, X+V does not require training invoice user interfaces or linguistics to operate; a basic knowledge of XML and related standards is sufficient to get started.

About Les Wilson
Les Wilson is an IBM senior technical staff member. He has been responsible for a variety of research and development projects related to man-machine interfaces, graphics, network computing, and user-interface technology. Les is currently the multimodal architect for IBM's Pervasive Computing Division.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Short answer yes. Long Answer:
1) X+V uses a standardized (W3C term is "Recommended") language for the voice markup whereas SALT does not.
2) X+V specifies XHTML as the "containing" GUI language.
3) X+V uses XMLEvents (another W3C "Rec") as the syntax for the application developer to specify the events that activate voice handlers. SALT leaves this up to the language into which it is being integrated.
That is, in addition to the Synthesis and Grammar formats, the "X", the "+", and the "V" of X+V are all W3C recommendations. One advantage of this characteristic of X+V is that it specifies a platform that in turn enables portability of the application. Additionally, specifying the linkage between visual and voice languages using XMLEvents enables standards based interperability between devices and servers for platform implementations that choose to distribute function across that boundary (e.g. distributing voice processing to a server but doing the GUI in the client).

I understand that both X+V and SALT use W3C standards for grammar and speech synthesis, but is X+V the *only* one of the two of them based entirely on standardized languages?

Interesting


Your Feedback
Les Wilson wrote: Short answer yes. Long Answer: 1) X+V uses a standardized (W3C term is "Recommended") language for the voice markup whereas SALT does not. 2) X+V specifies XHTML as the "containing" GUI language. 3) X+V uses XMLEvents (another W3C "Rec") as the syntax for the application developer to specify the events that activate voice handlers. SALT leaves this up to the language into which it is being integrated. That is, in addition to the Synthesis and Grammar formats, the "X", the "+", and the "V" of X+V are all W3C recommendations. One advantage of this characteristic of X+V is that it specifies a platform that in turn enables portability of the application. Additionally, specifying the linkage between visual and voice languages using XMLEvents enables standards based interperability between devices and servers for platform implementations that choose to distribute functio...
quEzztion wrote: I understand that both X+V and SALT use W3C standards for grammar and speech synthesis, but is X+V the *only* one of the two of them based entirely on standardized languages?
Les Wilson wrote: Interesting
SOA World Latest Stories
Faced with high in-house deployment, management, and support costs for its mission-critical enterprise applications, enterprises are turning to the cloud for its economic efficiencies and speed of deployment. But moving to the cloud requires some up-front planning and careful considera...
Cloud Computing Journal caught up with the CEO of a major new player in the fast-emerging Cloud ecosystem - a CEO who has taken an interesting and unusual decision. While signing up as the Platinum Plus Sponsor of the 5th International Cloud Expo, he and his company have decided to rem...
Novell broke its 18-day silence late Saturday morning and rejected the unsolicited $5.75-a-share offer to take the company private that Elliott Associates plunked on the table March 2. Novell wants more money. Bearing in mind that Novell currently has close to a billion dollars i...
NaviCloud is a next-generation platform that combines the economic efficiencies of cloud computing with true enterprise-class reliability and security. With built-in high-availability, a state of the art operations center, and a highly resilient service delivery infrastructure spanning...
Dell is suing Sharp, Hitachi, Toshiba, Seiko Epson and HannStar in district court in San Francisco for fixing the price of LCDs and overcharging since 1996. It wants treble damages. Bloomberg repeats the suit’s observation that Sharp and Hitachi admitted overcharging Dell in a plea agr...
SYS-CON Events announced today that VirtuDataCenter, a cloud computing network infrastructure company, will offer a complete turnkey alternative to today’s cloud computing solutions. They will exhibit at SYS-CON's 5th International Cloud Expo (www.CloudComputingExpo.com), which will ta...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE