Re: [whatwg] Serving up Theora in the real world

2009-07-10 Thread Philip Jagenstedt
On Fri, 10 Jul 2009 23:26:40 +0200, Philip Jagenstedt   
wrote:



On Fri, 10 Jul 2009 19:23:44 +0200, Aryeh Gregor
 wrote:

On Fri, Jul 10, 2009 at 4:57 AM, Robert  
O'Callahan wrote:
The way we've implemented in Firefox, we'll return "yes" if you  
specify a

codecs parameter and we support every codec in your list. So
v.canPlayType("video/ogg; codecs=vorbis,theora") returns "probably" in
Firefox 3.5. I think this is reasonable because I believe that, modulo  
bugs
in our implementation, we support the full Theora and Vorbis specs. On  
the
other hand, we will return "maybe" for v.canPlayType("video/ogg"). I  
think

this distinction will be useful.


In what use-case would an author want to make use of the distinction?
In either case, your only course of action is to try playing the
video.  Maybe you'd try testing all the video types you support, and
if one is "maybe" while another is "probably" you'd go with
"probably"?  That seems like a pretty marginal use case to help for
the sake of such a confusing API.  Programmers expect binary logic,
not ternary (look at the complaints about SQL's NULL).


I agree that the current interface is ugly and quite fail to see what the
use for it is. With a boolean return value,  
canPlayType("application/ogg")

would return true if one can demux Ogg streams.
canPlayType("application/ogg; codecs=vorbis,dirac") would return true if
one can demux Ogg and decode vorbis + dirac. Differentiating between
"maybe"/"probably" really seems like an edge use case, but you could if
you really wanted to:

function tertiaryCanPlayType(mime) {
 [container, codecs] = mime.split(";");
 if (canPlayType(mime)) {
   return codecs ? "probably" : "maybe";
 } else {
   // if there are codecs, canPlayType(container) would tell you if
problem is with the container format or the codecs
   return ""; // was "no"
 }
}


Before someone conjures up an example where this doesn't exactly match the  
current behavior, the point is simply that calling canPlayType without out  
a codecs list or with specific codecs, you can learn exactly what is  
supported and not out of the container formats and codecs you are  
interested in, without the need for the strange "probably"/"maybe"/"" API.



Unless there's some compelling use case that can't be handled with the
above I'd support canPlayType returning a boolean. The only issue I can
see is that canPlayType(foo)==true might be interpreted as a strong
promise of playability which can't be given. In that case just rename the
function to wouldTryTypeInResourceSelection (no, not really).




--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Serving up Theora in the real world

2009-07-10 Thread Philip Jagenstedt

On Fri, 10 Jul 2009 19:23:44 +0200, Aryeh Gregor
 wrote:

On Fri, Jul 10, 2009 at 4:57 AM, Robert O'Callahan  
wrote:
The way we've implemented in Firefox, we'll return "yes" if you specify  
a

codecs parameter and we support every codec in your list. So
v.canPlayType("video/ogg; codecs=vorbis,theora") returns "probably" in
Firefox 3.5. I think this is reasonable because I believe that, modulo  
bugs
in our implementation, we support the full Theora and Vorbis specs. On  
the
other hand, we will return "maybe" for v.canPlayType("video/ogg"). I  
think

this distinction will be useful.


In what use-case would an author want to make use of the distinction?
In either case, your only course of action is to try playing the
video.  Maybe you'd try testing all the video types you support, and
if one is "maybe" while another is "probably" you'd go with
"probably"?  That seems like a pretty marginal use case to help for
the sake of such a confusing API.  Programmers expect binary logic,
not ternary (look at the complaints about SQL's NULL).


I agree that the current interface is ugly and quite fail to see what the
use for it is. With a boolean return value, canPlayType("application/ogg")
would return true if one can demux Ogg streams.
canPlayType("application/ogg; codecs=vorbis,dirac") would return true if
one can demux Ogg and decode vorbis + dirac. Differentiating between
"maybe"/"probably" really seems like an edge use case, but you could if
you really wanted to:

function tertiaryCanPlayType(mime) {
[container, codecs] = mime.split(";");
if (canPlayType(mime)) {
  return codecs ? "probably" : "maybe";
} else {
  // if there are codecs, canPlayType(container) would tell you if
problem is with the container format or the codecs
  return ""; // was "no"
}
}

Unless there's some compelling use case that can't be handled with the
above I'd support canPlayType returning a boolean. The only issue I can
see is that canPlayType(foo)==true might be interpreted as a strong
promise of playability which can't be given. In that case just rename the
function to wouldTryTypeInResourceSelection (no, not really).

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Codecs for and

2009-07-08 Thread Philip Jagenstedt
On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard   
wrote:



On 7/7/09 1:10 PM, Philip Jagenstedt wrote:
On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard   
wrote:



Philip Jagenstedt wrote:
For all of the simpler use cases you can already generate sounds  
yourself with a data uri. For example, with is 2 samples of silence:  
"data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA".
Yes you can use this method, and with the current audio tag and  
autobuffer, it may work to some degree.


It does not produce smooth transitions.

At some point, a Blob / Stream API could make things like this easier.
If the idea is to write a Vorbis decoder in JavaScript that would be  
quite cool in a way, but for vendors already implementing Vorbis it  
wouldn't really add anything. A pure JS-implementation of any modern  
audio codec would probably be a ridiculous amount of code and slow, so  
I doubt it would be that useful in practice.


Well I'd like to disagree, and reiterate my prior arguments.  Vorbis  
decoders have been written in ActionScript and in Java.
They are not ridiculous, in size, nor in CPU usage. They can play audio  
streams, smoothly, and the file size is completely
tolerable. And the idea is codec neutrality, a Vorbis decoder is just  
one example.


OK, I won't make any assumptions of the size/speed of such an  
implementation until I see one.


For some use cases you could use 2 audio elements in tandem, mixing new  
sound to a new data URI when the first is nearing the end (although  
sync can't be guaranteed with the current API). But yes, there are  
things which can only be done by a streaming API integrating into the  
underlying media framework.

Yes, the current API is inadequate. data: encoding is insufficient.
Here's the list of propsed features right out of a comment block in the  
spec:

This list of features can be written without a spec, using ,
using a raw data buffer, and using ECMAScript.

A few of these features may need hardware level support, or a fast  
computer.

The  tag would be invisible, and the  tag would
provide the user interface.
Your use cases probably fall under audio filters and synthesis. I  
expect that attention will turn to gradually more complex use cases  
when the basic API we have now is implemented and stable cross-browser  
and cross-platform.
Yes, some of these use cases qualify as filters, some qualify as  
synthesis.
I'm proposing that simple filters and synthesis can be accomplished with  
modern
ECMAScript virtual machines and a raw data buffer. My use cases are  
qualified to current capabilities.


Apart from those use cases, I'm proposing that a raw data buffer will  
allow for

codec neutrality.

There are dozens of minor audio codecs, some simpler than others, some  
low bitrate,
that could be programmed in ECMAScript and would run just fine with  
modern ECMAScript VMs.


Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary  

codecs is a worthwhile endeavor. ECMAScript can detect if playback is  
too slow.


Additionally, in some cases, the programmer could work-around broken  
codec implementations.
It's forward-looking, it allows real backward compatibility and  
interoperability across browsers.


 allows for arbitrary, programmable video,  should allow
for programmable audio. Then, we can be codec neutral in our media  
elements.


While stressing that I don't think this should go into the spec until  
there's a proof-of-concept implementation that does useful stuff, is the  
idea to set audio.src=new MySynthesizer() and play()? (MySynthesizer would  
need to implement some standard interface.) You also have the question of  
push vs pull, i.e. does the audio source request data from the synthesizer  
when needed or does the synthesizer need to run a loop pushing audio data?


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Codecs for and

2009-07-07 Thread Philip Jagenstedt
On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard   
wrote:



Philip Jagenstedt wrote:
For all of the simpler use cases you can already generate sounds  
yourself with a data uri. For example, with is 2 samples of silence:  
"data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA".
Yes you can use this method, and with the current audio tag and  
autobuffer, it may work to some degree.


We've used the data:audio/midi technique, and we've experimented with  
audio/wav,
using the data: injection work-around, does not currently work all that  
well.


It does not produce smooth transitions. We can use raw encoding instead  
of base64 to save on

cpu cycles, but it's still quite "hackish".


It might be worthwhile implementing the API you want as a JavaScript  
library and see if you can actually do useful things with it. If the  
use cases are compelling and require native browser support to be  
performant enough, perhaps it could go into a future version of HTML.


Overall, we can not make near-real-time effects, nor jitter-free  
compositions.


We've used wav and midi in a JavaScript library, using the data: url  
technique.

The data: injection technique is inefficient, it's not workable.

Opera has been championing Xiph codecs on this list, There are  
ActionScript and Java Vorbis-players developed using the most basic of  
APIs.

Isn't that use-case compelling enough?


If the idea is to write a Vorbis decoder in JavaScript that would be quite  
cool in a way, but for vendors already implementing Vorbis it wouldn't  
really add anything. A pure JS-implementation of any modern audio codec  
would probably be a ridiculous amount of code and slow, so I doubt it  
would be that useful in practice.


For some use cases you could use 2 audio elements in tandem, mixing new  
sound to a new data URI when the first is nearing the end (although sync  
can't be guaranteed with the current API). But yes, there are things which  
can only be done by a streaming API integrating into the underlying media  
framework.


Here's the list of propsed features right out of a comment block in the  
spec:


* frame forward / backwards / step(n) while paused
* hasAudio, hasVideo, hasCaptions, etc
* per-frame control: get current frame; set current frame
* queue of content
  - pause current stream and insert content at front of queue to play  
immediately

  - pre-download another stream
  - add stream(s) to play at end of current stream
  - pause playback upon reaching a certain time
  - playlists, with the ability to get metadata out of them (e.g. xspf)
* control over closed captions:
  - enable, disable, select language
  - event that sends caption text to script
* in-band metadata and cue points to allow:
  - Chapter markers that synchronize to playback (without having to  
poll

the playhead position)
  - Annotations on video content (i.e., pop-up video)
  - General custom metadata store (ratings, etc.)
* notification of chapter labels changing on the fly:
  - onchapterlabelupdate, which has a time and a label
* cue points that trigger at fixed intervals, so that
  e.g. animation can be synced with the video
* general meta data, implemented as getters (don't expose the whole  
thing)
  - getMetadata(key: string, language: string) => HTMLImageElement or  
string

  - onmetadatachanged (no context info)
* external captions support (request from John Foliot)
* video: applying CSS filters
* an event to notify people of when the video size changes
  (e.g. for chained Ogg streams of multiple independent videos)
* balance and 3D position audio
* audio filters
* audio synthesis
* feedback to the script on how well the video is playing
   - frames per second?
   - skipped frames per second?
   - an event that reports playback difficulties?
   - an arbitrary quality metric?
* bufferingRate/bufferingThrottled (see v3BUF)
* events for when the user agent's controls get shown or hidden
  so that the author's controls can get away of the UA's

Your use cases probably fall under audio filters and synthesis. I expect  
that attention will turn to gradually more complex use cases when the  
basic API we have now is implemented and stable cross-browser and  
cross-platform.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Codecs for and

2009-07-07 Thread Philip Jagenstedt
On Tue, 07 Jul 2009 03:44:25 +0200, Charles Pritchard   
wrote:



Ian Hickson wrote:

On Mon, 6 Jul 2009, Charles Pritchard wrote:


Ian Hickson wrote:


On Mon, 6 Jul 2009, Charles Pritchard wrote:

This is on the list of things to consider in a future version. At  
this point I don't really want to add new features yet because  
otherwise we'll never get the browser vendors caught up to  
implementing the same spec. :-)



Consider a programmable  element as a priority.

Could you elaborate on what your use cases are? Is it just the  
ability to manually decode audio tracks?



Some users could manually decode a Vorbis audio stream.

I'm interested in altering pitch and pre-mixing channels. I believe  
some of these things are explored in CSS already.


There are accessibility cases, for the visually impaired, and I think  
that they will be better explored.




If you could elaborate on these use cases that would be really useful.  
How do you envisage using these features on Web pages?


Use a sound of varying pitch to hint to a user the location of their  
mouse
(is it hovering over a button, is it x/y pixels away from the edge of  
the screen, how close is it to the center).


Alter the pitch of a sound to make a very cheap midi instrument.

Pre-mix a few generated sounds, because the client processor is slow.

Alter the pitch of an actual audio recording, and pre-mix it,
to give different sounding voices to pre-recorded readings of a single  
text.

As has been tried for "male" "female" sound fonts.

Support very simple audio codecs, and programmable synthesizers.

The API must support a playback buffer.

putAudioBuffer(in AudioData) [ Error if audiodata properties are not  
supported ]
createAudioData( in sample hz, bits per sample, length ) [ Error if  
properties are not supported. ]

AudioData ( sampleHz, bitsPerSample, length, AudioDataArray )
AudioDataArray ( length, IndexGetter, IndexSetter ) 8 bits per property.

I think that's about it.

(There has been some discussion of suppoting an "audio canvas" before,  
but a lack of compelling use cases has really been the main blocker.  
Without a good understanding of the use cases, it's hard to design an  
API.)





For all of the simpler use cases you can already generate sounds yourself  
with a data uri. For example, with is 2 samples of silence:  
"data:audio/wav;base64,UklGRigAAABXQVZFZm10IBABAAEARKwAAIhYAQACABAAZGF0YQQA".


It might be worthwhile implementing the API you want as a JavaScript  
library and see if you can actually do useful things with it. If the use  
cases are compelling and require native browser support to be performant  
enough, perhaps it could go into a future version of HTML.


--
Philip Jägenstedt
Core Developer
Opera Software