On Tue, 18 May 2010 10:52:53 +0200, Bjorn Bringert <bring...@google.com> wrote:
On Tue, May 18, 2010 at 8:02 AM, Anne van Kesteren <ann...@opera.com> wrote:
I wonder how it relates to the <device> proposal already in the draft. In theory that supports microphone input too.

It would be possible to implement speech recognition on top of a
microphone input API. The most obvious approach would be to use
<device> to get an audio stream, and send that audio stream to a
server (e.g. using WebSockets). The server runs a speech recognizer
and returns the results.

Advantages of the speech input element:

- Web app developers do not need to build and maintain a speech
recognition service.

- Implementations can choose to use client-side speech recognition.
This could give reduced network traffic and latency (but probably also
reduced recognition accuracy and language support). Implementations
could also use server-side recognition by default, switching to local
recognition in offline or low bandwidth situations.

- Using a general audio capture API would require APIs for things like
audio encoding and audio streaming. Judging from the past results of
specifying media features, this may be non-trivial. The speech input
element turns all audio processing concerns into implementation
details.

- Implementations can have special UI treatment for speech input,
which may be different from that for general audio capture.

I guess I don't really see why this cannot be added on top of the <device> element. Maybe it is indeed better though to separate the too. The reason I'm mostly asking is that one reason we went with <device> rather than <input> is that the result of the user operation is not something that will partake in form submission. Now obviously a lot of use cases today for form controls do not partake in form submission but are handled by script, but all the controls that are there can be used as part of form submission. <input type=speech> does not seem like it can.


Advantages of using a microphone API:

- Web app developers get complete control over the quality and
features of the speech recognizer. This is a moot point for most
developers though, since they do not have the resources to run their
own speech recognition service.

- Fewer features to implement in browsers (assuming that a microphone
API would be added anyway).

Right, and I am pretty positive we will add a microphone API. What e.g. could be done is that you have a speech recognition object of some sorts that you can feed the audio stream that comes out of <device>. (Or indeed you feed the stream to a server via WebSocket.)


--
Anne van Kesteren
http://annevankesteren.nl/

Reply via email to