Re: [whatwg] Java language bindings for HTML5

2010-05-18 Thread Shiki Okasaka
On Tue, May 18, 2010 at 3:27 PM, Anne van Kesteren  wrote:
> On Tue, 18 May 2010 04:38:21 +0200, Shiki Okasaka  wrote:
>>
>> On Mon, May 17, 2010 at 6:27 PM, Kühn Wolfgang  wrote:
>>>
>>> Hi,
>>> As for the html5 elements, will there be a new package org.w3c.dom.html5?
>>
>> This is our concern, too. Historically each W3C specification
>> introduced its own module name. However, the recent specifications
>> tend to omit the module specification in the IDL definition.
>>
>>    cf.
>> http://lists.w3.org/Archives/Public/public-webapps/2009AprJun/1380.html
>>
>> In the IDL files we used above, we chose module names that seem to be
>> practical, but those are not part of the standard. Hopefully more
>> people will revisit this issue sometime soon.
>
> Can't they all just use org.w3c.dom? We cannot make the interface names
> overlap anyway.

I think one module name for all of the Web platform would work fine
for programming languages joining to the Web platform only recently.
But for languages like Java, I guess it would be nice to have a rule
for obtaining module names.

I'm curious how directory name (cssom, workers, postmsg, etc.) is
assigned for each specification today. Can we use the same name as a
module name in most of the cases? It wouldn't work for cssom,
cssom-vew, though.

 - Shiki


>
>
> --
> Anne van Kesteren
> http://annevankesteren.nl/
>


Re: [whatwg] New File attributes creationDate, modificationDate and size

2010-05-18 Thread Arun Ranganathan

On 5/18/10 2:45 PM, Eric Uhrhane wrote:

On Fri, May 14, 2010 at 5:05 PM, Arun Ranganathan  wrote:
   

On 5/12/10 4:25 AM, Ashley Sheridan wrote:
 

On Wed, 2010-05-12 at 00:05 -0400, Biju wrote:


   

It would be good if we can also get the same at server side when user
upload a file using form with file controls
ie, like the suggestion at
https://bugzilla.mozilla.org/show_bug.cgi?id=549253
(it works even with javascript disabled)

Also remember modificationDate can be a time before creationDate on
windows.
This is because modificationDate get copied when you copy a file,
hence it almost shows modificationDate of the actual content.

creationDate on other hand is file creation time on the
folder/directory and when you copy a file to a new directory, it will
be showing the coping time.

PS, for JS option there is mozilla bug 390776

 


   

I intend to update the File API so that the File object exposes creationDate
and modificationDate.
 

You might want to consider making an async getMetadata function; see
discussion ending at [1].  Async because the modification time can
change often, and as a generic all-metadata function because it's easy
to expand and experiment with.  And if you put that right in the File
API, then I can inherit it from the FileSystem API instead of having
to spec it myself ;'>.

   


Right now in our implementation (Fx 3.6.3), we work with files as 
copies, so if the underlying file changes, the case isn't handled.  But 
I agree that having an asynchronous API that is exposed to web content 
will allow more graceful behavior.


I'll take a look at the generic asynchronous "all-metatdata" function 
and consider adding it to the File object.


-- A*



Re: [whatwg] and links with @rel=embed

2010-05-18 Thread Silvia Pfeiffer
On Wed, May 19, 2010 at 4:38 AM, Tab Atkins Jr.  wrote:
> On Tue, May 18, 2010 at 11:20 AM, bjartur  wrote:
>> 
>>
>> First of all I think we should use  instead of 
>> . I'm not aware of previous proposals of that on this list. Feel 
>> free to provide links if it's already been proposed.
>
> That's syntax; like I said, syntax has never been a problem.
>
>> Second, all the responses I've seen so far have been along the lines of 
>> "it's the HTML5 way" (implying it's more of an XHTML 1 way, or [insert 
>> unfashionable tech here] way) or that video is so important that it deserves 
>> first-class treatment, and for the sake of completeness  has to be 
>> included as well (though interactive content, text and 3D models don't 
>> deserve to be "first-class").
>
> Then you haven't looked into it enough.
>
> The reason why a single inclusive multimedia element is bad is because
> different types of media have different requirements.  The javascript
> API that makes sense to expose for audio is different than the one you
> want to expose for video.  Video needs subtitles, audio doesn't (it
> needs transcription, but that's something different entirely, and can
> be handled with a simple link).  Etc.
>
> Theoretically, you can swap out what type of api you expose based on
> what the currently active media is.  In practice, that's a horrible
> idea and no one wants to try to do it again.  It's even harder to try
> and sort out multiple additional resources that should be attached to
> particular media, but not others, like subtitle tracks that should be
> attached to a video but shouldn't be exposed to an audio, etc.

I disagree with that last statement - audio does need subtitle-like
constructs, too. For example, if you want lyrics to be displayed in
sync with the audio, then it can be done with the same construct as
captions or subtitles work for video.


But I do agree with the need to keep audio and video as different
resource types with their own APIs. This far, in the life of HTML,
text has been the main driver of the Web, interspersed with images and
formatting instructions for text. By treating video and audio as their
own elements in their own right, we can finally move on into a world
of multimedia where audio and video can have their own formatting
instructions and their own JavaScript API - none of this would e
possible by leaving them hidden in an  element, or even in an
 element, where it is totally unclear what the content will be.

Bjartur, I do wonder what exactly your motivation is in suggesting
audio and video be degraded back to undetermined embedded outside
content. What would be won by going back to that way? Lots will be
lost though - if you have ever developed a Website with Flash video,
you will know how much work is involved in creating a specific
JavaScript API to the Flash video player so that you can have some
interaction between the rest of the Webpage and the video or audio
resource. We honestly don't ever want to go back to that way.

Cheers,
Silvia.


Re: [whatwg] New File attributes creationDate, modificationDate and size

2010-05-18 Thread Eric Uhrhane
On Fri, May 14, 2010 at 5:05 PM, Arun Ranganathan  wrote:
> On 5/12/10 4:25 AM, Ashley Sheridan wrote:
>>
>> On Wed, 2010-05-12 at 00:05 -0400, Biju wrote:
>>
>>
>>>
>>> It would be good if we can also get the same at server side when user
>>> upload a file using form with file controls
>>> ie, like the suggestion at
>>> https://bugzilla.mozilla.org/show_bug.cgi?id=549253
>>> (it works even with javascript disabled)
>>>
>>> Also remember modificationDate can be a time before creationDate on
>>> windows.
>>> This is because modificationDate get copied when you copy a file,
>>> hence it almost shows modificationDate of the actual content.
>>>
>>> creationDate on other hand is file creation time on the
>>> folder/directory and when you copy a file to a new directory, it will
>>> be showing the coping time.
>>>
>>> PS, for JS option there is mozilla bug 390776
>>>
>>
>>
>
> I intend to update the File API so that the File object exposes creationDate
> and modificationDate.

You might want to consider making an async getMetadata function; see
discussion ending at [1].  Async because the modification time can
change often, and as a generic all-metadata function because it's easy
to expand and experiment with.  And if you put that right in the File
API, then I can inherit it from the FileSystem API instead of having
to spec it myself ;'>.

  Eric

[1] http://lists.w3.org/Archives/Public/public-device-apis/2010Apr/0054.html


[whatwg] ProgressEvent for appcache - loadedItems vs loaded

2010-05-18 Thread Patrick Mueller

under 6.6.4 "Downloading or updating an application cache"
under "The application cache download process steps are as follows:"
under step 17: "The application cache download process steps are as
follows:"
under sub-step 2: "For each cache host associated with an application cache"

Discussion about using "lengthComputable", "total", and "loaded"
attributes of the ProgressEvent interface.

There is a new draft of the Progress events spec here:

   http://dev.w3.org/2006/webapi/progress/Progress.html

In this draft, two new attributes of the ProgressEvent interface have
been added: "loadedItems" and "totalItems". These attributes might be
better suited to be used rather than the "total" and "loaded" attributes.

Same for step 18, which indicates a final ProgressEvent should be sent
at the completion of the cache download process.

--
Patrick Mueller - http://muellerware.org



Re: [whatwg] Should scripts and plugins in contenteditable content be enabled or disabled?

2010-05-18 Thread Robert O'Callahan
On Wed, May 19, 2010 at 5:35 AM, Ojan Vafai  wrote:

> The webkit behavior of allowing all scripts makes the most sense to me. It
> should be possible to disable scripts, but that capability shouldn't be tied
> to editability. The clean solution for the CKEditor developer is to use a
> sandboxed iframe.
>
> I don't see a security benefit for disabling script as you'd have all the
> same issues with loading any user-content in a non-editable area. The only
> catch is that you *do* need to disable script from pasted and drag-dropped
> content (see http://trac.webkit.org/changeset/53442). Basically, any site
> serving user-content will already need to mitigate XSS some other way, so
> disabling script in editable areas is not necessary, but paste/drag-drop
> can't reasonably rely on server-side solutions, so must be done by the UA.
>
> Putting my developer hat on, trying to make Google Gadgets work in Google's
> rich text editor inside Firefox designMode was awful due to
> https://bugzilla.mozilla.org/show_bug.cgi?id=519928. A large percentage of
> Google Gadgets load as iframes and require javascript onload. We had to play
> tricks with turning off designMode, appending the iframe and turning
> designMode back on. It was an awful solution that never worked very well.
>

That makes sense to me. I'll see what the other editor developers think.

Rob
-- 
"He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all." [Isaiah
53:5-6]


Re: [whatwg] and links with @rel=embed

2010-05-18 Thread Tab Atkins Jr.
On Tue, May 18, 2010 at 11:20 AM, bjartur  wrote:
> 
>
> First of all I think we should use  instead of 
> . I'm not aware of previous proposals of that on this list. Feel free 
> to provide links if it's already been proposed.

That's syntax; like I said, syntax has never been a problem.

> Second, all the responses I've seen so far have been along the lines of "it's 
> the HTML5 way" (implying it's more of an XHTML 1 way, or [insert 
> unfashionable tech here] way) or that video is so important that it deserves 
> first-class treatment, and for the sake of completeness  has to be 
> included as well (though interactive content, text and 3D models don't 
> deserve to be "first-class").

Then you haven't looked into it enough.

The reason why a single inclusive multimedia element is bad is because
different types of media have different requirements.  The javascript
API that makes sense to expose for audio is different than the one you
want to expose for video.  Video needs subtitles, audio doesn't (it
needs transcription, but that's something different entirely, and can
be handled with a simple link).  Etc.

Theoretically, you can swap out what type of api you expose based on
what the currently active media is.  In practice, that's a horrible
idea and no one wants to try to do it again.  It's even harder to try
and sort out multiple additional resources that should be attached to
particular media, but not others, like subtitle tracks that should be
attached to a video but shouldn't be exposed to an audio, etc.


> Isn't interactive content not important enaugh? What about text? What if one 
> want's to link to interactive maps? s...@src?  with .embed 
> {content: url(attr(href)) }? AFAIK CSS doesn't support it, and if it does  rel="embed"..> could be used as well, even without explicit browser support.

SVG can be embedded directly in HTML.  Otherwise, you can use 
or  as the generic handlers.  Note, though, that these latter
methods deal with their contents in an agnostic way.  You don't get
special apis exposed on  if you link to an SVG file, frex.  If
you want special handling of certain types of media, though, you
really want a specialized element like  and  get.

~TJ


Re: [whatwg] and links with @rel=embed

2010-05-18 Thread bjartur


First of all I think we should use  instead of 
. I'm not aware of previous proposals of that on this list. Feel free 
to provide links if it's already been proposed.

Second, all the responses I've seen so far have been along the lines of "it's 
the HTML5 way" (implying it's more of an XHTML 1 way, or [insert unfashionable 
tech here] way) or that video is so important that it deserves first-class 
treatment, and for the sake of completeness  has to be included as well 
(though interactive content, text and 3D models don't deserve to be 
"first-class").

Isn't interactive content not important enaugh? What about text? What if one 
want's to link to interactive maps? s...@src?  with .embed 
{content: url(attr(href)) }? AFAIK CSS doesn't support it, and if it does  could be used as well, even without explicit browser support.

Also someone wrote that one should use  for media-types that aren't 
supported in HTML yet. If you insist on keeping  and , think of 
this as a way for second-class media-types WHATWG hasn't approved/haven't been 
implemented to use some of the features of  and .

It's possible to specify the media type with attributes like @media for media 
queries and @type for the type of specific resources. That way media queries 
and MIME media types like audio, video, model and text can be reused, and all 
types IANA might add in the future.

IMO multimedia should be "first-class". And embedding more information than 
necessary in tag names is just /wrong/ and hampers compatibility and 
exensibility.


Re: [whatwg] Speech input element

2010-05-18 Thread Doug Schepers

Hi, Bjorn-

Bjorn Bringert wrote (on 5/17/10 9:05 AM):

Back in December there was a discussion about web APIs for speech
recognition and synthesis that saw a decent amount of interest
(http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December/thread.html#24281).
Based on that discussion, we would like to propose a simple API for
speech recognition, using a new  element. An
informal spec of the new API, along with some sample apps and use
cases can be found at:
http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.

It would be very helpful if you could take a look and share your
comments. Our next steps will be to implement the current design, get
some feedback from web developers, continue to tweak, and seek
standardization as soon it looks mature enough and/or other vendors
become interested in implementing it.


This is important work, thanks for taking it on and bringing it to a 
wider discussion forum.  Here's a couple of other venues you might also 
consider discussing it, above and beyond discussion on the WHATWG list:


* W3C just launched a new Audio Incubator Group (Audio XG), as a forum 
to discuss various aspects of audio on the Web.  The Audio XG is not 
intended to produce Recommendation-track specifications like this 
(though they will likely prototype and write a draft spec for a 
read-write audio API), but it could serve a role in helping work out use 
cases and requirements, reviewing specs, and so forth.  I'm not totally 
sure that this is relevant to your interests, but I thought I would 
bring it up.


* The Voice Browser Working Group is very interested in bringing their 
work and experience into the graphical browser world, so you should work 
with them or get their input.  As I understand it, some of them plan to 
join the Audio XG, too (specifically to talk about speech synthesis in 
the larger context), so that might be one forum to have some 
conversations.  VoiceXML is rather different than X/HTML or the browser 
DOM, and the participants in the VBWG don't necessarily have the right 
experience in graphical browser approaches, so I think there's an 
opportunity for good conversation and cross-pollination here.


[1] http://www.w3.org/2005/Incubator/audio/
[2] http://www.w3.org/Voice/

Regards-
-Doug


Re: [whatwg] Should scripts and plugins in contenteditable content be enabled or disabled?

2010-05-18 Thread Ojan Vafai
On Fri, Apr 23, 2010 at 2:34 AM, Robert O'Callahan wrote:

> On Fri, Apr 23, 2010 at 6:52 PM, Simon Pieters  wrote:
>
>> It seems Hixie has decided to go back to the WebKit behavior in the spec
>> for designMode.
>>
>> http://html5.org/tools/web-apps-tracker?from=2817&to=2818
>>
>
>  It's certainly the easiest to implement, but you can see feedback in
> https://bugzilla.mozilla.org/show_bug.cgi?id=519928 that this makes life
> difficult for people writing editors.
>
> Thanks for the links.
>

The webkit behavior of allowing all scripts makes the most sense to me. It
should be possible to disable scripts, but that capability shouldn't be tied
to editability. The clean solution for the CKEditor developer is to use a
sandboxed iframe.

I don't see a security benefit for disabling script as you'd have all the
same issues with loading any user-content in a non-editable area. The only
catch is that you *do* need to disable script from pasted and drag-dropped
content (see http://trac.webkit.org/changeset/53442). Basically, any site
serving user-content will already need to mitigate XSS some other way, so
disabling script in editable areas is not necessary, but paste/drag-drop
can't reasonably rely on server-side solutions, so must be done by the UA.

Putting my developer hat on, trying to make Google Gadgets work in Google's
rich text editor inside Firefox designMode was awful due to
https://bugzilla.mozilla.org/show_bug.cgi?id=519928. A large percentage of
Google Gadgets load as iframes and require javascript onload. We had to play
tricks with turning off designMode, appending the iframe and turning
designMode back on. It was an awful solution that never worked very well.

Ojan


Re: [whatwg] and links with @rel=embed

2010-05-18 Thread Tab Atkins Jr.
On Tue, May 18, 2010 at 4:11 AM, bjartur  wrote:
> 
>
> This is yet another proposal to replace , ,  etc with a 
> single element: .

You indicate that you are aware of the previous proposals to provide a
single element for all media resources.  Are you aware of the
counter-arguments against them?  If so, what is your answer to them?
At the moment all you've done is provide a marginally different syntax
than what's come before, but syntax was never an issue with previous
unified-media elements.

~TJ


Re: [whatwg] Speech input element

2010-05-18 Thread Kazuyuki Ashimura

Hi Bjorn and James,

Just FYI, W3C is organizing a workshop on Conversational Applications.
The main goal of the workshop is collecting use cases and requirements
for new models of human language to support mobile conversational
systems.  The workshop will be held on June 18-19 in Somerset, NJ, US.

The detailed call for participation is available at:
 http://www.w3.org/2010/02/convapps/cfp.html

I think there may be some discussion during the workshop about a
possible multimodal e-learning system as a use case.  Is either of you
by chance interested in the workshop?

Regards,

Kazuyuki


Bjorn Bringert wrote:

On Mon, May 17, 2010 at 10:55 PM, James Salsman  wrote:

On Mon, May 17, 2010 at 8:55 AM, Bjorn Bringert  wrote:

- What exactly are grammars builtin:dictation and builtin:search?

They are intended to be implementation-dependent large language
models, for dictation (e.g. e-mail writing) and search queries
respectively. I've tried to clarify them a bit in the spec now. There
should perhaps be more of these (e.g. builtin:address), maybe with
some optional, mapping to builtin:dictation if not available.

Bjorn, are you interested in including speech recognition support for
pronunciation assessment such as is done by http://englishcentral.com/
, http://www.scilearn.com/products/reading-assistant/ ,
http://www.eyespeakenglish.com/ , and http://wizworldonline.com/ ,
http://www.8dworld.com/en/home.html ?

Those would require different sorts of language models and grammars
such as those described in
http://www.springerlink.com/content/l0385t6v425j65h7/

Please let me know your thoughts.


I don't have SpringerLink access, so I couldn't read that article. As
far as I could tell from the abstract, they use phoneme-level speech
recognition and then calculate the edit distance to the "correct"
phoneme sequences. Do you have a concrete proposal for how this could
be supported? Would support for PLS
(http://www.w3.org/TR/pronunciation-lexicon/) links in SRGS be enough
(the SRGS spec already includes that)?



--
Kazuyuki Ashimura / W3C Multimodal & Voice Activity Lead
mailto: ashim...@w3.org
voice: +81.466.49.1170 / fax: +81.466.49.1171



[whatwg] and links with @rel=embed

2010-05-18 Thread bjartur


This is yet another proposal to replace , ,  etc with a 
single element: .
Example says more than a hundred words:

A video recording of the conference
A sound recording of the conference

Change @rel=embed to some better name if you want and correct the media query 
which may be wrong but you should get the general idea.
This seems backwards compatible and I'd argue that it's more semantic, even to 
unsupporting browsers, than wrapping it up in Flash (though authors can add an 
 after the s if they want).
Also it's possible to group elements similiar to alternate stylesheets with the 
same title, or by nesting . We just have to standardize on either one.

...
 
Video track
Audio track



And some quotes from Tim Berners-Lee who thought of the issue of ,  
and  in '93.

timbl:
>I had imagined that figues would be reprented as
>
>Figure 
>
>where the relation ship values mean
>
>EMBED   Embed this here when presenting it
>PRESENT Present this whenever the source document is presented
>Note that you can have various combinations of these, and if the browser 
>doesn’t support either one, it doesn’t break.
I'm not quite sure how PRESENT is useful, though.

timbl:
>Let the IMG tag be INCLUDE and let it refer to an arbitrary document type. Or 
>EMBED if INCLUDE sounds like a cpp include which people will expect to provide 
>SGML source code to be parsed inline — not what was intended.
s/IMG/video and audio/


Re: [whatwg] Speech input element

2010-05-18 Thread Kazuyuki Ashimura

Hi Bjorn,

Thank you for your bringing this topic (again :) to the WHAT WG list.
I'd like to bring this to the W3C Voice Browser Working Group (and
maybe the Multimodal Interaction Working Group as well) and ask
the group participants for opinion.

As you might know, the group recently created a task force named
"Voice on the Web" and work hard to promote voice technology on
various possible Web applications.

Regards,

Kazuyuki


Bjorn Bringert wrote:

Back in December there was a discussion about web APIs for speech
recognition and synthesis that saw a decent amount of interest
(http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December/thread.html#24281).
Based on that discussion, we would like to propose a simple API for
speech recognition, using a new  element. An
informal spec of the new API, along with some sample apps and use
cases can be found at:
http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.

It would be very helpful if you could take a look and share your
comments. Our next steps will be to implement the current design, get
some feedback from web developers, continue to tweak, and seek
standardization as soon it looks mature enough and/or other vendors
become interested in implementing it.



--
Kazuyuki Ashimura / W3C Multimodal & Voice Activity Lead
mailto: ashim...@w3.org
voice: +81.466.49.1170 / fax: +81.466.49.1171


Re: [whatwg] Timestamp from video source in order to sync (e.g. expose OGG timestamp to javascript)

2010-05-18 Thread Silvia Pfeiffer
On Tue, May 18, 2010 at 7:28 PM, Robert O'Callahan  wrote:
> On Tue, May 18, 2010 at 8:23 PM, Odin Omdal Hørthe 
> wrote:
>>
>> Justin Dolske's idea looks rather nice:
>> > This seems like a somewhat unfortunate thing for the spec, I bet
>> > everyone's
>> > going to get it wrong because it won't be common. :( I can't help but
>> > wonder if
>> > it would be better to have a startTimeOffset property, so that
>> > .currentTime et
>> > al are all still have a timeline starting from 0, and if you want the
>> > "real"
>> > time you'd use .currentTime + .startTimeOffset.
>> >
>> > I'd also suspect we'll want the default video controls to normalize
>> > everything
>> > to 0 (.currentTime - .startTime), since it would be really confusing
>> > otherwise.
>
>
> That's exactly what I've advocated before. I lost the argument, but I forget
> why, probably because I didn't understand the reasons.


To be honest, it doesn't make much sense to display the "wrong" time
in a player. If a video stream starts at 10:30am and goes for 30 min,
then a person joining the stream 10 min in should see a time of 10min
- or better even 10:40am - which is in sync with what others see that
joined at the start. It would be rather confusing if the same position
in a video would be linked by one person as "at offset 10min" while
another would say "at offset 0min". And since the W3C Media Fragments
WG is defining temporal addressing, such diverging pointers will even
end up in a URL and how should that be interpreted then?

Cheers,
Silvia.


Re: [whatwg] Speech input element

2010-05-18 Thread Bjorn Bringert
On Tue, May 18, 2010 at 10:27 AM, Satish Sampath  wrote:
>> Well, the problem with alert is that the assumption (which may or may not
>> always hold) is that when alert() is opened, web page shouldn't run
>> any scripts. So should  fire some events when the
>> recognition is canceled (if alert cancels recognition), and if yes,
>> when? Or if recognition is not canceled, and something is recognized
>> (so "input" event should be dispatched), when should the event actually
>> fire? The problem is pretty much the same with synchronous XMLHttpRequest.
>
> In my opinion, once the speech input element has started recording any event
> which takes the user's focus away from actually speaking should ideally stop
> the speech recognition. This would include switching to a new window, a new
> tab or modal/alert dialogs, submitting a form or navigating to a new page in
> the same tab/window.

Yes, I agree with that. The tricky issue, as Olli points out, is
whether and when the 'error' event should fire when recognition is
aborted because the user moves away or gets an alert. What does
XMLHttpRequest do?

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902


Re: [whatwg] Timestamp from video source in order to sync (e.g. expose OGG timestamp to javascript)

2010-05-18 Thread Robert O'Callahan
On Tue, May 18, 2010 at 8:23 PM, Odin Omdal Hørthe wrote:

> Justin Dolske's idea looks rather nice:
> > This seems like a somewhat unfortunate thing for the spec, I bet
> everyone's
> > going to get it wrong because it won't be common. :( I can't help but
> wonder if
> > it would be better to have a startTimeOffset property, so that
> .currentTime et
> > al are all still have a timeline starting from 0, and if you want the
> "real"
> > time you'd use .currentTime + .startTimeOffset.
> >
> > I'd also suspect we'll want the default video controls to normalize
> everything
> > to 0 (.currentTime - .startTime), since it would be really confusing
> otherwise.
>

That's exactly what I've advocated before. I lost the argument, but I forget
why, probably because I didn't understand the reasons.

Rob
-- 
"He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all." [Isaiah
53:5-6]


Re: [whatwg] Speech input element

2010-05-18 Thread Satish Sampath
>
> Well, the problem with alert is that the assumption (which may or may not
> always hold) is that when alert() is opened, web page shouldn't run
> any scripts. So should  fire some events when the
> recognition is canceled (if alert cancels recognition), and if yes,
> when? Or if recognition is not canceled, and something is recognized
> (so "input" event should be dispatched), when should the event actually
> fire? The problem is pretty much the same with synchronous XMLHttpRequest.
>

In my opinion, once the speech input element has started recording any event
which takes the user's focus away from actually speaking should ideally stop
the speech recognition. This would include switching to a new window, a new
tab or modal/alert dialogs, submitting a form or navigating to a new page in
the same tab/window.

--
Cheers
Satish


Re: [whatwg] Speech input element

2010-05-18 Thread Olli Pettay

On 5/18/10 11:27 AM, Bjorn Bringert wrote:

On Mon, May 17, 2010 at 9:23 PM, Olli Pettay  wrote:

On 5/17/10 6:55 PM, Bjorn Bringert wrote:


(Looks like half of the first question is missing, so I'm guessing
here) If you are asking about when the web app loses focus (e.g. the
user switches to a different tab or away from the browser), I think
the recognition should be cancelled. I've added this to the spec.



Oh, where did the rest of the question go.

I was going to ask about alert()s.
What happens if alert() pops up while recognition is on?
Which events should fire and when?


Hmm, good question. I think that either the recognition should be
cancelled, like when the web app loses focus, or it should continue
just as if there was no alert. Are there any browser implementation
reasons to do one or the other?



Well, the problem with alert is that the assumption (which may or may 
not always hold) is that when alert() is opened, web page shouldn't run

any scripts. So should  fire some events when the
recognition is canceled (if alert cancels recognition), and if yes,
when? Or if recognition is not canceled, and something is recognized
(so "input" event should be dispatched), when should the event actually 
fire? The problem is pretty much the same with synchronous XMLHttpRequest.



-Olli


Re: [whatwg] Speech input element

2010-05-18 Thread Bjorn Bringert
On Tue, May 18, 2010 at 8:02 AM, Anne van Kesteren  wrote:
> On Mon, 17 May 2010 15:05:22 +0200, Bjorn Bringert 
> wrote:
>>
>> Back in December there was a discussion about web APIs for speech
>> recognition and synthesis that saw a decent amount of interest
>>
>> (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December/thread.html#24281).
>> Based on that discussion, we would like to propose a simple API for
>> speech recognition, using a new  element. An
>> informal spec of the new API, along with some sample apps and use
>> cases can be found at:
>>
>> http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.
>>
>> It would be very helpful if you could take a look and share your
>> comments. Our next steps will be to implement the current design, get
>> some feedback from web developers, continue to tweak, and seek
>> standardization as soon it looks mature enough and/or other vendors
>> become interested in implementing it.
>
> I wonder how it relates to the  proposal already in the draft. In
> theory that supports microphone input too.

It would be possible to implement speech recognition on top of a
microphone input API. The most obvious approach would be to use
 to get an audio stream, and send that audio stream to a
server (e.g. using WebSockets). The server runs a speech recognizer
and returns the results.

Advantages of the speech input element:

- Web app developers do not need to build and maintain a speech
recognition service.

- Implementations can choose to use client-side speech recognition.
This could give reduced network traffic and latency (but probably also
reduced recognition accuracy and language support). Implementations
could also use server-side recognition by default, switching to local
recognition in offline or low bandwidth situations.

- Using a general audio capture API would require APIs for things like
audio encoding and audio streaming. Judging from the past results of
specifying media features, this may be non-trivial. The speech input
element turns all audio processing concerns into implementation
details.

- Implementations can have special UI treatment for speech input,
which may be different from that for general audio capture.


Advantages of using a microphone API:

- Web app developers get complete control over the quality and
features of the speech recognizer. This is a moot point for most
developers though, since they do not have the resources to run their
own speech recognition service.

- Fewer features to implement in browsers (assuming that a microphone
API would be added anyway).

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902


Re: [whatwg] Speech input element

2010-05-18 Thread Bjorn Bringert
On Mon, May 17, 2010 at 10:55 PM, James Salsman  wrote:
> On Mon, May 17, 2010 at 8:55 AM, Bjorn Bringert  wrote:
>>
>>> - What exactly are grammars builtin:dictation and builtin:search?
>>
>> They are intended to be implementation-dependent large language
>> models, for dictation (e.g. e-mail writing) and search queries
>> respectively. I've tried to clarify them a bit in the spec now. There
>> should perhaps be more of these (e.g. builtin:address), maybe with
>> some optional, mapping to builtin:dictation if not available.
>
> Bjorn, are you interested in including speech recognition support for
> pronunciation assessment such as is done by http://englishcentral.com/
> , http://www.scilearn.com/products/reading-assistant/ ,
> http://www.eyespeakenglish.com/ , and http://wizworldonline.com/ ,
> http://www.8dworld.com/en/home.html ?
>
> Those would require different sorts of language models and grammars
> such as those described in
> http://www.springerlink.com/content/l0385t6v425j65h7/
>
> Please let me know your thoughts.

I don't have SpringerLink access, so I couldn't read that article. As
far as I could tell from the abstract, they use phoneme-level speech
recognition and then calculate the edit distance to the "correct"
phoneme sequences. Do you have a concrete proposal for how this could
be supported? Would support for PLS
(http://www.w3.org/TR/pronunciation-lexicon/) links in SRGS be enough
(the SRGS spec already includes that)?

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902


Re: [whatwg] Speech input element

2010-05-18 Thread Bjorn Bringert
On Mon, May 17, 2010 at 9:23 PM, Olli Pettay  wrote:
> On 5/17/10 6:55 PM, Bjorn Bringert wrote:
>
>> (Looks like half of the first question is missing, so I'm guessing
>> here) If you are asking about when the web app loses focus (e.g. the
>> user switches to a different tab or away from the browser), I think
>> the recognition should be cancelled. I've added this to the spec.
>>
>
> Oh, where did the rest of the question go.
>
> I was going to ask about alert()s.
> What happens if alert() pops up while recognition is on?
> Which events should fire and when?

Hmm, good question. I think that either the recognition should be
cancelled, like when the web app loses focus, or it should continue
just as if there was no alert. Are there any browser implementation
reasons to do one or the other?


>> The grammar specifies the set of utterances that the speech recognizer
>> should match against. The grammar may be annotated with SISR, which
>> will be used to populate the 'interpretation' field in ListenResult.
>
> I know what grammars are :)

Yeah, sorry about my silly reply there, I just wasn't sure exactly
what you were asking.


> What I meant that it is not very well specified that the result is actually
> put to .value etc.

Yes, good point. The alternatives would be to use either the
'utterance' or the 'interpretation' value from the most likely
recognition result. If the grammar does not contain semantics, those
are identical, so it doesn't matter in that case. If the developer has
added semantics to the grammar, the interpretation is probably more
interesting than the utterance. So my conclusion is that it would make
most sense to store the interpretation in @value. I've updated the
spec with better definitions of @value and @results.


> And still, I'm still not quite sure what builtin:search actually
> is. What kind of grammar would that be? How is that different from
> builtin:dictation?

To be useful, those should probably be large statistical language
models (e.g. n-gram models) trained on different corpora. So
"builtin:dictation" might be trained on a corpus containing e-mails,
SMS messages and news text, and "builtin:search" might be trained on
query strings from a search engine. I've updated the spec to make
"builtin:search" optional, mapping to "builtin:dictation" if not
implemented. The exact language matched by these models would be
implementation dependent, and implementations may choose to be clever
about them. For example by:

- Dynamic tweaking for different web apps based on the user's previous
inputs and the text contained in the web app.

- Adding the names of all contacts from the user's address book to the
dictation model.

- Weighting place names based on geographic proximity (in an
implementation that has access to the user's location).


-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902


Re: [whatwg] Timestamp from video source in order to sync (e.g. expose OGG timestamp to javascript)

2010-05-18 Thread Odin Omdal Hørthe
On Tue, May 18, 2010 at 1:00 AM, Nikita Eelen  wrote:
> I think he means something similar to what QuickTime broadcaster and 
> quicktime streaming
> server does with a delay on a live stream or wowza media server with flash 
> media encoder
> when using h.264, unless I am misunderstanding something. Is that correct 
> Odin? Not sure
> how ice cast deals with it but I bet it's a similar issue,

Yes, I initially used Darwin Streaming Server, but found Icecast2 much
better for *my use*. So I use it in the same way. I'm having Icecast
buffer 1MB worth of data so that it can burst all that to the client
(the browser in this case) so that its own buffering can go faster. So
even there we're quite far behind.

And also, the browsers often stops up a few seconds, and buffers a bit
more, and then continue playing (although they have buffered more than
a few seconds ahead already!), so then they are drifting even further
away from real time.



But I have important news, my bug at Mozilla was closed because they
mean it's actually in the spec already. Because:

> The startTime  attribute must, on getting, return the earliest possible 
> position, expressed in seconds.

And they mean that in a live stream, that would be when I started the
stream (like VLC does). So that the stream in-browser already shows
00:31:30 if we're 31 minutes and 30 seconds into the live stream.

So actually, then the spec is good enough for my uses for synchronising.

You may watch this mozilla bug here:



However, I think that it's rather hard to find out what the spec
means. Because *earliest POSSIBLE*. What is meant by possible? With
live streaming it is not possible to go further back in the stream.
What do you think? What is meant by this? If it does not help me, then
adding a field for getting the _real_ time code data from the video
would be very usable.

It's talked about in this example:


> For example, if two clips have been concatenated into one video file, but the 
> video format
> exposes the original times for the two clips, the video data might expose a 
> timeline that
> goes, say, 00:15..00:29 and then 00:05..00:38. However, the user agent would 
> not expose
> those times; it would instead expose the times as 00:15..00:29 and 
> 00:29..01:02, as a
> single video.

That's well and good, but it would be nice to get the actual time code
data for live streaming and these syncing uses if startTime is not the
earliest time that exists.


Justin Dolske's idea looks rather nice:
> This seems like a somewhat unfortunate thing for the spec, I bet everyone's
> going to get it wrong because it won't be common. :( I can't help but wonder 
> if
> it would be better to have a startTimeOffset property, so that .currentTime et
> al are all still have a timeline starting from 0, and if you want the "real"
> time you'd use .currentTime + .startTimeOffset.
>
> I'd also suspect we'll want the default video controls to normalize everything
> to 0 (.currentTime - .startTime), since it would be really confusing 
> otherwise.

from 

-- 
Beste helsing,
Odin Hørthe Omdal 
http://velmont.no


Re: [whatwg] Speech input element

2010-05-18 Thread Anne van Kesteren
On Mon, 17 May 2010 15:05:22 +0200, Bjorn Bringert   
wrote:

Back in December there was a discussion about web APIs for speech
recognition and synthesis that saw a decent amount of interest
(http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December/thread.html#24281).
Based on that discussion, we would like to propose a simple API for
speech recognition, using a new  element. An
informal spec of the new API, along with some sample apps and use
cases can be found at:
http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.

It would be very helpful if you could take a look and share your
comments. Our next steps will be to implement the current design, get
some feedback from web developers, continue to tweak, and seek
standardization as soon it looks mature enough and/or other vendors
become interested in implementing it.


I wonder how it relates to the  proposal already in the draft. In  
theory that supports microphone input too.



--
Anne van Kesteren
http://annevankesteren.nl/