Re: [whatwg] PeerConnection, MediaStream, getUserMedia(), and other feedback

Per-Erik Brodin Tue, 02 Aug 2011 02:06:09 -0700

On 2011-07-26 07:30, Ian Hickson wrote:

On Tue, 19 Jul 2011, Per-Erik Brodin wrote:


Perhaps now that there is no longer any relation to tracks on the media
elements we could also change Track to something else, maybe Component.
I have had people complaining to me that Track is not really a good name
here.


I'm happy to change the name if there's a better one. I'm not sure
Component is any better than Track though.


OK, let's keep Track until someone comes up with a better name then.

Good. Could we still keep audio and video in separate lists though? It
makes it easier to check the number of audio or video components and you
can avoid loops that have to check the kind for each iteration if you
only want to operate on one media type.


Well in most (almost all?) cases, there'll be at most one audio track and
at most one video track, which is why I didn't put them in separate lists.
What use cases did you have in mind where there would be enough tracks
that it would be better for them to be separate lists?

Yes, you're right, but even with zero or one track it's more convenientto have them separate because that way you can more easily check if thestream contains any audio and/or video tracks and check the number oftracks of each kind. I also think it will be problematic if we wouldlike to add another kind at a later stage if all tracks are in the samelist since people will make assumptions that audio and video are theonly kinds.

I also think that it would be easier to construct new MediaStream
objects from individual components rather than temporarily disabling the
ones you do not want to copy to the new MediaStream object and then
re-enabling them again afterwards.


Re-enabling them afterwards would re-include them in the copies, too.

Why is this needed? If a new MediaStream object is constructed fromanother MediaStream I think it would be simpler to just let that be aclone of the stream with all tracks present (with the enabled/disabledstates independently set).

The main use case here is temporarily disabling a video or audio track in
a video conference. I don't understand how your proposal would work for
that. Can you elaborate?

A new MediaStream object is created from the video track of aLocalMediaStream to be used as self-view. The LocalMediaStream can thenbe sent over PeerConnection and the video track disabled withoutaffecting the MediaStream being played back locally in the self-view. Inaddition, my proposal opens up for additional use cases that requirecombining tracks from different streams, such as recording aconversation (a number of audio tracks from various streams, local andremote combined to a single stream).

It is also unclear to me what happens to a LocalMediaStream object that
is currently being consumed in that case.


Not sure what you mean. Can you elaborate?

I was under the impression that, if a stream of audio and video is beingsent to one peer and then another peer joins but only audio should besent, then video would have to be temporarily disabled in the firststream in order to construct a new MediaStream object containing onlythe audio track. Again, it would be simpler to construct a newMediaStream object from just the audio track and send that.

Why should the label the same as the parent on the newly constructed
MediaStream object?


The label identifies the source of the media. It's the same source, so,
same label.

I agree, but usually you have more than one source in a MediaStream andif you construct a new MediaStream from it which doesn't contain all ofthe sources from the parent I don't think the label should be the same.By the way, what happens if you call getUserMedia() twice and get thesame set of sources both times, do you get the same label then? What ifthe user selects different sources the second time?

If you send two MediaStream objects constructed from the same
LocalMediaStream over a PeerConnection there needs to be a way to
separate them on the receiving side.


What's the use case for sending the same feed twice?

If the labels are the same then that should indicate that it'sessentially the same stream and there should be no need to send ittwice. If the streams are not composed of the same underlying sourcesthen you may want to send them both and the labels should differ.

I also think it is a bit unfortunate that we now have a 'label' property
on the track objects that means something else than the 'label' property
on MediaStream, perhaps 'description' would be a more suitable name for
the former.


In what sense do they mean different things? I don't understand the
problem here. Can you elaborate?

As Tommy pointed out, label on MediaStream is an identifier for thestream whereas label och MediaStreamTrack is a description of the source.

The current design is just the result of needing to define what
happens when you call getRecordedData() twice in a row. Could you
elaborate on what API you think we should have?


What I am thinking of is something similar to what was proposed in
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/030921.html


That doesn't answer the question of what happens if you call stop() twice.


Nothing will happen the second time since recording has already stopped.

(Also, having to call a method and hook an event so that you can read an
attribute seems like a rather round-about way of getting data. Is calling
a method with a callback not simpler?)

When the event has been fired you can read the attribute whenever youwant to get the blob, how many times you want. I prefer that over havingstop() take a callback argument.

Quota doesn't seem particularly important here. It's not like you can
really do lasting damage. It would just be a DOS attack, like creating a
Web page with an infinite number of 10000x10000 canvases. We can just let
the "hardware limitation" clause handle it.

In a video blog recording application it would be nice to be able topresent to the user how much more can be recorded and not just handle itas a hardware limitation, since that could mean dropping the entirerecording.

I was not saying that it would not be possible to keep track of which
blob: URLs that point to blobs and which point to streams just that we
want to avoid doing that in the early stage of the media engine
selection. In my opinion a stream is quite the opposite of a blob
(unknown, perhaps infinite length vs. fixed length) so when printing the
URLs for debugging purposes it would also be much nicer to have two
different protocol schemes. If I remember correctly the discussions
leading up to the renaming of createBlobURL to createObjectURL assumed
that there would be stream: URLs.


You wouldn't be able to remove that logic, since http: URLs would still
have the same needs. You can have finite and infinite http: resources,
just like you can have finite and infinite blob: resources. I don't really
see the problem here. Indeed, with blob:, it's trivial to find out if the
resource is finite or not; with http: you might not know until the whole
finite resource is downloaded.

If there is something I'm missing here please do let me know.

The differentiation is not between finite and infinite resources butrather between playback media resources and conversational mediaresources. blob: and http: are both handled by the playback media enginewhereas stream: is handled by the conversational media engine. We wouldlike to be able to determine which engine to use by simply looking atthe URL.

PeerConnection is an EventTarget but it still uses a callback for
the signaling messages and this mixture of events and callbacks is a
bit awkward in my opinion. If you would like to change the function
that handles signaling messages after calling the constructor you
would have to wrap a function call inside the callback to the actual
signal handling function, instead of just (re-)setting an onsignal
(or whatever) attribute listener (the event could reuse the
MessageEvent interface).


When would you change the callback?


If you would like to send the signaling messages peer-to-peer over the
data channel, once it is established.


That seems like a disaster waiting to happen. The UDP data channel is
unreliable, the signaling channel has to be reliable. Worse, the UDP data
channel might go down at any second, and then the user agent would try to
re-establish it using the signaling channel.

You can provide a reliable channel on top of the unreliable channel andmonitor the PeerConnection state so that you know when to fall back toserver-relayed signaling. One reason to do this would be to improve thesignaling latency which can be of importance in applications that, forexample, trigger format renegotiation due to change in video display size.

   - It's easy to not register a callback, which makes no sense. There's
     literally never a use for create a PeerConnection without a signaling
     channel, as far as I can tell, so making it easier to create one
     without a callback than with seems like a bad design.

For example, creating an EventSource without registering any listener
for incoming events equally does not make sense.


Actually, it does. One operation mode for EventSource is to have events
with different names, each triggering a different event listener.

An EventSource without any event listener seems rather useless to me.Even if you can assign multiple handlers for events with differentnames, all those handlers could still be provided as arguments to theconstructor, right? That would ensure that nobody can create anEventSource without registering at least one event listener.

There is a potential problem in the exchange of SDPs in that glare
conditions can occur if both peers add streams simultaneously, in
which case there will be two different outstanding offers that none
of the peers are allowed to respond to according to the SDP
offer-answer model. Instead of using one SDP session for all media
as the specification suggests, we are handling the offer-answer for
each stream separately to avoid such conditions.


Why isn't this handled by the ICE role conflict processing rules? It
seems like simultaneous ICE restarts would be trivially resolvable by
just following the rules in the ICE spec. Am I missing something?


This problem is not related to ICE but rather to the SDP offer-answer
model which is separate from the ICE processing. The problem is that SDP
offer-answer does not allow you to respond to an offer when you have an
outstanding offer for the same set of streams.


As far as I can tell, your interpretation is incorrect. This is entirely
related to ICE, and ICE, as far as I can tell, defines this exact case in
its role conflict resolution.

The only time this can happen is if you have both ends do an ICE restart
at exactly the same time. The offer from each ICE agent will be received
by the other as if it was the response, and thus there will be a role
conflict and the ICE role conflict resolution process will kick in. No?

No, an ICE role conflict is not the same thing as a glare condition inSDP offer-answer.


//Per-Erik

Re: [whatwg] PeerConnection, MediaStream, getUserMedia(), and other feedback

Reply via email to