Re: [whatwg] Extending HTML 5 video for adaptive streaming
On Sat, Jul 2, 2011 at 2:51 AM, Aaron Colwell acolw...@google.com wrote: On Thu, Jun 30, 2011 at 4:13 PM, Robert O'Callahan rob...@ocallahan.orgwrote: On Fri, Jul 1, 2011 at 4:59 AM, Aaron Colwell acolw...@google.comwrote: I've also been looking at the WebRTC MediaStream API and was wondering if it makes more sense to create an object similar to the LocalMediaStream object. This has the benefits of unifying how media streams are handled independent of whether they come from a camera or a JavaScript based streaming algorithm. This could also enable sending the media stream through a Peer-to-peer connection instead of only allowing a camera as a source. Here is an example of the type of object I'm talking about. I think MediaStreams should not be dealing with compressed data except as an optimization when access to decoded data is not required anywhere in the stream pipeline. If you want to do processing of decoded stream data (which I do --- see http://hg.mozilla.org/users/rocallahan_mozilla.com/specs/raw-file/tip/StreamProcessing/StreamProcessing.html), then introducing a decoder inside the stream processing graph creates all sorts of complications. Nice spec. If I understand correctly, your position is that MediaStreams should only represent uncompressed media? Sort of. I want the data format (compressed vs uncompressed, etc) to be hidden from Web authors unless they use APIs like Worker-based processing that require access to decoded data. What I don't want to have to deal with is compressed data being injected at arbitrary points in the graph. Right now the only the place compressed data is injected is at stream sources --- media elements and getUserMedia. Rob -- If we claim to be without sin, we deceive ourselves and the truth is not in us. If we confess our sins, he is faithful and just and will forgive us our sins and purify us from all unrighteousness. If we claim we have not sinned, we make him out to be a liar and his word is not in us. [1 John 1:8-10]
Re: [whatwg] Extending HTML 5 video for adaptive streaming
Hi Robert, comments inline. On Thu, Jun 30, 2011 at 4:13 PM, Robert O'Callahan rob...@ocallahan.orgwrote: On Fri, Jul 1, 2011 at 4:59 AM, Aaron Colwell acolw...@google.com wrote: I've also been looking at the WebRTC MediaStream API and was wondering if it makes more sense to create an object similar to the LocalMediaStream object. This has the benefits of unifying how media streams are handled independent of whether they come from a camera or a JavaScript based streaming algorithm. This could also enable sending the media stream through a Peer-to-peer connection instead of only allowing a camera as a source. Here is an example of the type of object I'm talking about. I think MediaStreams should not be dealing with compressed data except as an optimization when access to decoded data is not required anywhere in the stream pipeline. If you want to do processing of decoded stream data (which I do --- see http://hg.mozilla.org/users/rocallahan_mozilla.com/specs/raw-file/tip/StreamProcessing/StreamProcessing.html), then introducing a decoder inside the stream processing graph creates all sorts of complications. Nice spec. If I understand correctly, your position is that MediaStreams should only represent uncompressed media? In the case of camera/mic data they represent the uncompressed bits before they go to the codec for transmission over a PeerConnection or before they are rendered by a audio/video. In the case of standard audio/video playback they would represent the uncompressed audio before it is sent to the audio card and the uncompressed video before it is blitted on the screen. From a stream processing point of view I can see how this makes sense. I was just thinking that LocalMediaStream is just a wrapper around a source of media data and all I was doing was providing a mechanism to provide media data from JavaScript instead of from hardware. I think the natural way to support the functionality you're looking for is to extend the concept of Blob URLs. Right now you can create a binary Blob, mint a URL for it and set that URL as the source for a media element. The only extension you need is the ability to append data to the Blob while retaining the same URL; you would need to initially mark the Blob as open to indicate to URL consumers that the data stream has not ended. That extension would be useful for all sorts of things because you can use those Blob URLs anywhere. An alternative would be to create a new kind of object representing an appendable sequence of Blobs and create an API to mint URLs for it. I thought about that, but I saw an earlier WHATWG threadhttp://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/032221.html which lead me down this MediaStream path. Using MediaStreams made more sense to me because my use case felt similar to the live capture case except that I'm using compressed media and it comes from JavaScript instead of hardware. Also MediaStream already had a way to pass stream URLs to audio video for camera and remote peer stream data so I figured I could just leverage that. Note that with my API proposal above, you can get a MediaStream from a media element that's using any URL and send that through a PeerConnection. I see that. Interactions with PeerConnection were not a primary concern for me. I was only mentioning it as a side benefit of using MediaStream. Thanks for your comments. I appreciate them. Aaron
Re: [whatwg] Extending HTML 5 video for adaptive streaming
Hi Adam, On Thu, Jun 30, 2011 at 5:20 PM, Adam Malcontenti-Wilson adman.com@ gmail.com wrote: @acolwell: Is the appendData method one your suggesting or one already specified/existing? I'm suggesting it. It was a quick and dirty way to try out some ideas I had while working on a prototype for Chromium. Now that I actually want to take this out of the prototype stage, I'm trying to get a sense of whether appendData() or a MediaStream based solution is more desirable. @robert: Some problems with concept of blobs being appended to, or as I have previously described as Streaming Blobs was mentioned at http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/032221.html I'm not exactly sure what that meant - but I'd expect the ideas discussed are similar. I saw this thread as well which is why I went down the MediaStream path. :) Aaron
Re: [whatwg] Extending HTML 5 video for adaptive streaming
Hi Aaron, Here are some other aspects of script controlled adaptive bit rate that occur to me, perhaps you have already considered these. 1) I guess script will be responsible for maintaining its own playback buffer, monitoring buffer behavior and selecting the appropriate bit rate for new fragments. Are there any other network related events/metrics script might need to determine which bit-rate to fetch for the next segment? Is there any other information from the user agent about playback performance that script might need? 2) If a media resource is a multi-track resource then it would seem script will also have to fetch fragments for those tracks which implies that the audio element would need the append method. Timed text tracks would also need to be processed and Cues appended. There is a new media pipeline task force in the Web and TV IG (http://www.w3.org/2011/webtv/wiki/MPTF) that is also planning to examine this topic. You may want to participate. Regards, Bob Lund -Original Message- From: whatwg-boun...@lists.whatwg.org [mailto:whatwg- boun...@lists.whatwg.org] On Behalf Of Aaron Colwell Sent: Thursday, June 30, 2011 10:59 AM To: wha...@whatwg.org Subject: [whatwg] Extending HTML 5 video for adaptive streaming Hi, I've been working on an adaptive streaming prototype that uses JavaScript to fetch chunks of media and feeds them to the video tag for decoding. The idea is to let the adaptation algorithm and CDN interactions happen in JavaScript so that they can evolve without the need for browser changes. I'm looking for some guidance about the preferred method for adding this type of functionality. I'm new to this process so please bear with me. My initial implementation is built around WebM, but I believe this could work for Ogg MP4 as well. The basic idea is to initialize the video tag with stream initialization data (ie WebM info tracks elements) via the video src attribute and then send media chunks (ie WebM clusters) to the tag via a new appendData() method on video. Here is a simple example of what I'm talking about. video id=v autoplay /video script function needMoreData(e) { e.target.appendData(getNextCluster()); } function onSeeking(e) { var video = e.target; video.appendData(findClusterForTime(video.currentTime)); } var video = document.getElementById('v'); video.addEventListener('loadstart', needMoreData); video.addEventListener('stalled', needMoreData); video.addEventListener('seeking', onSeeking); video.src = URL.createObjectURL(createStreamInitBlob()); /script AppendData() expects to recieve a Uint8Array that contains WebM cluster elements. The first cluster passed to appendData() initializes the starting playback position. Also after a seeking event fires the first appendData() updates the current position to the seek point. I've also been looking at the WebRTC MediaStream API and was wondering if it makes more sense to create an object similar to the LocalMediaStream object. This has the benefits of unifying how media streams are handled independent of whether they come from a camera or a JavaScript based streaming algorithm. This could also enable sending the media stream through a Peer-to-peer connection instead of only allowing a camera as a source. Here is an example of the type of object I'm talking about. interface GeneratedMediaStream : MediaStream { void init(in DOMString type, in UInt8Array init_data); void appendData(in DOMString trackId, in UInt8Array data); void endOfStream(); readonly attribute MultipleTrackList audioTracks; readonly attribute ExclusiveTrackList videoTracks; }; type - identifies the type of stream we are generating(ie video/x-webm- cluster-stream or video/ogg-page-stream) init_data - Provides initialization data that indicates the number of tracks, codec configs, etc. (ie WebM info tracks elements or Ogg header pages) trackId - Indicates what track the data is for. If this is an empty string than multiplexed data is being passed in. If not empty trackId matches an id of a track in the TrackList objects. data - media data chunk (ie WebM cluster or Ogg page). Data is expected to have monotonically increasing timestamps, no gaps, etc. Here are my questions: - Is there a preference for appendData() vs new MediaStream object? - If the MediaStream object is preferred, should this be constructed through Navigator.getUserMedia()? I'm unclear about what the criteria is for adding this to Navigator vs allowing direct object construction. - Are there existing efforts along these lines? If so, please point me to them. Thanks for your help, Aaron
Re: [whatwg] Extending HTML 5 video for adaptive streaming
Hi Bob, Comments inline On Fri, Jul 1, 2011 at 8:40 AM, Bob Lund b.l...@cablelabs.com wrote: Hi Aaron, Here are some other aspects of script controlled adaptive bit rate that occur to me, perhaps you have already considered these. 1) I guess script will be responsible for maintaining its own playback buffer, monitoring buffer behavior and selecting the appropriate bit rate for new fragments. Are there any other network related events/metrics script might need to determine which bit-rate to fetch for the next segment? Is there any other information from the user agent about playback performance that script might need? The script would be responsible for managing buffering. It can use the currentTime buffered attributes on the video tag to monitor the consumption of the data passed in via appendData(). I believe the attributes being proposed in the video metrics proposalhttp://wiki.whatwg.org/wiki/Video_Metrics#Proposal could also be helpful. Right now I'm just using XMLHttpRequest to fetch WebM clusters and measuring how long it takes to fetch them to create a bandwidth estimate. I haven't spent much time on the BW measurement adaptation algorithms yet. I'm just trying to nail down mechanism for passing the media data to the browser first. 2) If a media resource is a multi-track resource then it would seem script will also have to fetch fragments for those tracks which implies that the audio element would need the append method. Timed text tracks would also need to be processed and Cues appended. The idea is that appendData() can receive media for multiple tracks. In the case of WebM each cluster can have blocks from different tracks multiplexed together. The initial stream config information contains the the track mappings necessary to demux the cluster. I was also planning to allow both multiplexed and demultiplexed clusters. Cluster timecodes must be in monotonically increasing order, but it would be possible to call appendData() with an cluster with only audio data followed by a cluster with only video data. This would allow straight forward support for deployments where audio video tracks for a single presentation are in separate WebM files. There is a new media pipeline task force in the Web and TV IG ( http://www.w3.org/2011/webtv/wiki/MPTF) that is also planning to examine this topic. You may want to participate. I have signed up to the mailing list and will take some time to catch up with the archives. Thanks for your comments. Aaron
[whatwg] Extending HTML 5 video for adaptive streaming
Hi, I've been working on an adaptive streaming prototype that uses JavaScript to fetch chunks of media and feeds them to the video tag for decoding. The idea is to let the adaptation algorithm and CDN interactions happen in JavaScript so that they can evolve without the need for browser changes. I'm looking for some guidance about the preferred method for adding this type of functionality. I'm new to this process so please bear with me. My initial implementation is built around WebM, but I believe this could work for Ogg MP4 as well. The basic idea is to initialize the video tag with stream initialization data (ie WebM info tracks elements) via the video src attribute and then send media chunks (ie WebM clusters) to the tag via a new appendData() method on video. Here is a simple example of what I'm talking about. video id=v autoplay /video script function needMoreData(e) { e.target.appendData(getNextCluster()); } function onSeeking(e) { var video = e.target; video.appendData(findClusterForTime(video.currentTime)); } var video = document.getElementById('v'); video.addEventListener('loadstart', needMoreData); video.addEventListener('stalled', needMoreData); video.addEventListener('seeking', onSeeking); video.src = URL.createObjectURL(createStreamInitBlob()); /script AppendData() expects to recieve a Uint8Array that contains WebM cluster elements. The first cluster passed to appendData() initializes the starting playback position. Also after a seeking event fires the first appendData() updates the current position to the seek point. I've also been looking at the WebRTC MediaStream API and was wondering if it makes more sense to create an object similar to the LocalMediaStream object. This has the benefits of unifying how media streams are handled independent of whether they come from a camera or a JavaScript based streaming algorithm. This could also enable sending the media stream through a Peer-to-peer connection instead of only allowing a camera as a source. Here is an example of the type of object I'm talking about. interface GeneratedMediaStream : MediaStream { void init(in DOMString type, in UInt8Array init_data); void appendData(in DOMString trackId, in UInt8Array data); void endOfStream(); readonly attribute MultipleTrackList audioTracks; readonly attribute ExclusiveTrackList videoTracks; }; type - identifies the type of stream we are generating(ie video/x-webm-cluster-stream or video/ogg-page-stream) init_data - Provides initialization data that indicates the number of tracks, codec configs, etc. (ie WebM info tracks elements or Ogg header pages) trackId - Indicates what track the data is for. If this is an empty string than multiplexed data is being passed in. If not empty trackId matches an id of a track in the TrackList objects. data - media data chunk (ie WebM cluster or Ogg page). Data is expected to have monotonically increasing timestamps, no gaps, etc. Here are my questions: - Is there a preference for appendData() vs new MediaStream object? - If the MediaStream object is preferred, should this be constructed through Navigator.getUserMedia()? I'm unclear about what the criteria is for adding this to Navigator vs allowing direct object construction. - Are there existing efforts along these lines? If so, please point me to them. Thanks for your help, Aaron
Re: [whatwg] Extending HTML 5 video for adaptive streaming
On Fri, Jul 1, 2011 at 4:59 AM, Aaron Colwell acolw...@google.com wrote: I've also been looking at the WebRTC MediaStream API and was wondering if it makes more sense to create an object similar to the LocalMediaStream object. This has the benefits of unifying how media streams are handled independent of whether they come from a camera or a JavaScript based streaming algorithm. This could also enable sending the media stream through a Peer-to-peer connection instead of only allowing a camera as a source. Here is an example of the type of object I'm talking about. I think MediaStreams should not be dealing with compressed data except as an optimization when access to decoded data is not required anywhere in the stream pipeline. If you want to do processing of decoded stream data (which I do --- see http://hg.mozilla.org/users/rocallahan_mozilla.com/specs/raw-file/tip/StreamProcessing/StreamProcessing.html), then introducing a decoder inside the stream processing graph creates all sorts of complications. I think the natural way to support the functionality you're looking for is to extend the concept of Blob URLs. Right now you can create a binary Blob, mint a URL for it and set that URL as the source for a media element. The only extension you need is the ability to append data to the Blob while retaining the same URL; you would need to initially mark the Blob as open to indicate to URL consumers that the data stream has not ended. That extension would be useful for all sorts of things because you can use those Blob URLs anywhere. An alternative would be to create a new kind of object representing an appendable sequence of Blobs and create an API to mint URLs for it. Note that with my API proposal above, you can get a MediaStream from a media element that's using any URL and send that through a PeerConnection. Rob -- If we claim to be without sin, we deceive ourselves and the truth is not in us. If we confess our sins, he is faithful and just and will forgive us our sins and purify us from all unrighteousness. If we claim we have not sinned, we make him out to be a liar and his word is not in us. [1 John 1:8-10]
Re: [whatwg] Extending HTML 5 video for adaptive streaming
@acolwell: Is the appendData method one your suggesting or one already specified/existing? @robert: Some problems with concept of blobs being appended to, or as I have previously described as Streaming Blobs was mentioned at http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/032221.html I'm not exactly sure what that meant - but I'd expect the ideas discussed are similar. On Fri, Jul 1, 2011 at 9:13 AM, Robert O'Callahan rob...@ocallahan.org wrote: On Fri, Jul 1, 2011 at 4:59 AM, Aaron Colwell acolw...@google.com wrote: I've also been looking at the WebRTC MediaStream API and was wondering if it makes more sense to create an object similar to the LocalMediaStream object. This has the benefits of unifying how media streams are handled independent of whether they come from a camera or a JavaScript based streaming algorithm. This could also enable sending the media stream through a Peer-to-peer connection instead of only allowing a camera as a source. Here is an example of the type of object I'm talking about. I think MediaStreams should not be dealing with compressed data except as an optimization when access to decoded data is not required anywhere in the stream pipeline. If you want to do processing of decoded stream data (which I do --- see http://hg.mozilla.org/users/rocallahan_mozilla.com/specs/raw-file/tip/StreamProcessing/StreamProcessing.html), then introducing a decoder inside the stream processing graph creates all sorts of complications. I think the natural way to support the functionality you're looking for is to extend the concept of Blob URLs. Right now you can create a binary Blob, mint a URL for it and set that URL as the source for a media element. The only extension you need is the ability to append data to the Blob while retaining the same URL; you would need to initially mark the Blob as open to indicate to URL consumers that the data stream has not ended. That extension would be useful for all sorts of things because you can use those Blob URLs anywhere. An alternative would be to create a new kind of object representing an appendable sequence of Blobs and create an API to mint URLs for it. Note that with my API proposal above, you can get a MediaStream from a media element that's using any URL and send that through a PeerConnection. Rob -- If we claim to be without sin, we deceive ourselves and the truth is not in us. If we confess our sins, he is faithful and just and will forgive us our sins and purify us from all unrighteousness. If we claim we have not sinned, we make him out to be a liar and his word is not in us. [1 John 1:8-10] -- Adam Malcontenti-Wilson