Thanks for the discussion - cool to see more interest today also (http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html)
I've hacked up a proof-of-concept JavaScript API for speech recognition and synthesis. It adds a navigator.speech object with these functions: void listen(ListenCallback callback, ListenOptions options); void speak(DOMString text, SpeakCallback callback, SpeakOptions options); The implementation uses an NPAPI plugin for the Android browser that wraps the existing Android speech APIs. The code is available at http://code.google.com/p/speech-api-browser-plugin/ There are some simple demo apps in http://code.google.com/p/speech-api-browser-plugin/source/browse/trunk/android-plugin/demos/ including: - English to Spanish speech-to-speech translation - Google search by speaking a query - The obligatory pizza ordering system - A phone number dialer Comments appreciated! /Bjorn On Fri, Dec 4, 2009 at 2:51 PM, Olli Pettay <olli.pet...@helsinki.fi> wrote: > Indeed the API should be something significantly simpler than X+V. > Microsoft has (had?) support for SALT. That API is pretty simple and > provides speech recognition and TTS. > The API could be probably even simpler than SALT. > IIRC, there was an extension for Firefox to support SALT (well, there was > also an extension to support X+V). > > If the platform/OS provides ASR and TTS, adding a JS API for it should > be pretty simple. X+V tries to handle some logic using VoiceXML FIA, but > I think it would be more web-like to give pure JS API (similar to SALT). > Integrating visual and voice input could be done in scripts. I'd assume > there would be some script libraries to handle multimodal input integration > - especially if there will be touch and gestures events too etc. (Classic > multimodal map applications will become possible in web.) > > But this all is something which should be possibly designed in or with W3C > multimodal working group. I know their current architecture is way more > complex, but X+X, SALT and even Multimodal-CSS has been discussed in that > working group. > > > -Olli > > > > On 12/3/09 2:50 AM, Dave Burke wrote: >> >> We're envisaging a simpler programmatic API that looks familiar to the >> modern Web developer but one which avoids the legacy of dialog system >> languages. >> >> Dave >> >> On Wed, Dec 2, 2009 at 7:25 PM, João Eiras <jo...@opera.com >> <mailto:jo...@opera.com>> wrote: >> >> On Wed, 02 Dec 2009 12:32:07 +0100, Bjorn Bringert >> <bring...@google.com <mailto:bring...@google.com>> wrote: >> >> We've been watching our colleagues build native apps that use >> speech >> recognition and speech synthesis, and would like to have JavaScript >> APIs that let us do the same in web apps. We are thinking about >> creating a lightweight and implementation-independent API that lets >> web apps use speech services. Is anyone else interested in that? >> >> Bjorn Bringert, David Singleton, Gummi Hafsteinsson >> >> >> This exists already, but only Opera supports it, although there are >> problems with the library we use for speech recognition. >> >> http://www.w3.org/TR/xhtml+voice/ >> >> http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/ >> >> Would be nice to revive that specification and get vendor buy-in. >> >> >> >> -- >> >> João Eiras >> Core Developer, Opera Software ASA, http://www.opera.com/ >> >> > > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902