[Standards] Re: Proposed XMPP Extension: Jingle Remote Control

Goffi Tue, 21 May 2024 09:05:07 -0700

Le mardi 21 mai 2024, 16:39:28 UTC+2 Marvin W a écrit :
> Hi Goffi,
> 
> On Tue, 2024-05-21 at 12:47 +0200, Goffi wrote:
> > I know that, I've just ruled out using <message> through the server
> > as it has 
> > been proposed in another feedback.
> 
> Why do you rule that out? Because you don't see a purpose, when my
> whole point is that I do see a purpose? Of course I can send whatever
> CBOR/JSON you come up with as a base64 blob inside a <message> for my
> usecase, but then I wonder why not to handle it in first place.


I missed that you had a particular use case in mind for using <message> via 
the server.

Could you please tell me more about this use case and why it isn't covered by 
Jingle transmission? Specifically, what is the advantage of sending <message> 
through the server instead of handling it directly? I'd like to understand 
your perspective better.
 
> [SNIP]
> RFB definitely is old, so these kind of things are expected. And, while
> I see that you added clipboard as a potential future extension, it
> seems odd to complain that RFB has a suboptimal implementation of a
> feature your proposed XEP currently doesn't have at all.

That's just something that jump on eye after a quick check, if UTF-8 is not 
natively supported, I can see problems coming. But I'll check in more details 
anyway, as I've said before, I'm not against a protocol change if it makes 
sense.

Regarding that it seems odd to you because clipboard sharing is not yet 
specified, that's simply anticipation.

> 
> [SNIP]
>  
> I know that your specification doesn't transfer the modifier flags,
> probably assuming they are superfluous. However, if your browser client
> was to naively send the key events it receives as is without further
> checking for plausibility, things will go wrong: I tested pressing the
> keys that would logically result in the events meta down, control down,
> control up, meta up and here are the results on different browsers:
> https://imgur.com/a/zVxDAVa

That's the job of the controlling client to assure consistency. That may be 
specified in business rules though.

> From what I understand, the state of keyup and keydown events in the
> web API doesn't need to be consistent (e.g. there can be keydown
> without keyup and vice-versa). Do we want the same behavior for this
> protocol or something else?

The wire format comes from the web API, but we are not developing browsers, we 
are developing XMPP clients.

> I think you misunderstood my point. Using a smartphone as a touch pad
> or gamepad while playing a game on a screen next to you, is low latency
> feedback (you can see the screen with low latency). Example for where
> you don't need low latency would be when blindly typing into a remote
> shell, because you won't get feedback there (except after confirming a
> command which is probably not low latency).

And we come back to the point where I don't see the need to another way of 
sending input, when there is already a low latency one. I'm not saying that 
you are wrong, I'm saying that I don't see why we should have another 
mechanism, when there is already low latency way to send input data to any 
device. So please, provide me one or more use cases where the current 
specification is not valid and would not work, or work sub-optimally.

At risk of repeating myself: I'm not closed-minded about changing protocols or 
designs, and I appreciate feedback from people with other experiences, but 
please provide clear examples of use cases where the current design is 
incorrect.

> [SNIP]
> > Again, it is not from scratch. It's re-using existing protocols, in a
> > simple, 
> > working, easy-to-implement, and efficient way.
> 
> I was talking about the remote control protocol, which is what runs on
> the topmost layer (inside the webrtc datachannel or whatever other
> Jingle transport is used). This protocol is mostly from scratch (it's
> loosely based on web API events, but then only taking an arbitrarily
> picked subset of events and event properties)

It's not arbitrarily at all, it's discarding data which don't make sense in 
this context, and it has been done while doing an implementation with 
Freedesktop remote control portal.

> Which isn't an issue if web clients are not relevant for my usecase.
> And honestly, any kind of pointing to "you should support web clients"
> sounds weird to me. It certainly is interesting that we can support web
> clients, but really shouldn't siphon into unrelated specifications (and
> this one totally is unrelated to web).

I've already said that I'll reformulate to only make is a suggestion, without 
the "SHOULD".

> [SNIP]
> My point is: Either it's a Jingle session or it's not part of XMPP.
> Jingle doesn't use WebRTC. It just happens that WebRTC APIs are
> somewhat compatible to Jingle (because they are based on Jingle), but
> from XMPP perspective, you never have WebRTC sessions. I don't know
> exactly what it means to be in the same WebRTC session, but whatever
> you want here, make it more explicit, because people that don't use
> WebRTC APIs should not be required to first read the WebRTC specs (or
> probably implementations source code) to figure out what you mean by
> that.

Right, I'll review this section.

> 
> 
> > The issue is that video feed is used in this case to get the screen
> > dimension. 
> > Without it, we can't get touch event which use absolute position
> > (while for 
> > mouse, there is a relative position mode for exactly this use case).
> 
> That's a problematic design. As I said, clients might scale the video
> to reduce bandwidth use. Dino also has logic to adjust the video
> resolution of cameras depending on available bandwidth.

I was thinking about sending the screen size at the beginning, but the issue 
is when size change (e.g. remote application control when application is 
resized). Issue with [0,1] coordinate is that you go into prevision loss or 
rounding error troubles. I think that ideally the screen size should be send 
separately and updated.

> 
> And as I understood for mouse, it's not relative to the screen, but
> relative to the previous position, aka a movement vector, like reported
> from touchpads.
> An screen relative position that is 0,0 is upper left corner, 0.5,0.5
> is center of the screen and 1,1 is lower right corner, would work
> independent of the target screen resolution.

It would not work in the case of a FPS when you have already reached the right 
corner of your screen and you need to go right again.

> 
> > An alternative would be to specify screen dimension when establishing
> > the 
> > remote control session.
> 
> Might work, but then you also need to cover the case where the screen
> resolution changes during remote control.

Yes, that with update on screen change is probably the best option.

> [SNIP]
> The Web API uses double because they did weird things for HiDPI. On the
> hardware layer, there are only pixels and if you click on a point on
> the screen, it will always be on a pixel (at least in all OS that I am
> aware of). The transformation of HiDPI in browsers abstract away from
> actual pixels and 1px might be more or less than a physical pixel. But
> why would you want to carry this abstraction through the network to a
> system that shouldn't care about what browsers can do and what they
> think a pixel is?

I have no strong argument against this to be honest. I'm fine with int too.

 
> > It was just to handle the case where no device is accepted, there was
> > 2 
> > options:
> > - reject it totally
> > - say it's a simple screen share session.
> > 
> > I've chosen the later one. But indeed, data channel is then useless.
> > Can 
> > change it for the other option.
> 
> We also don't allow Jingle file transfers of no file or RTP contents
> without any codecs. As this protocol is for remote control, it should
> remain entirely unused for screen share only.

Sure, I'll change that.

> > - I'm not hard set on technologies, and I'm OK to get rid of CBOR is
> > there is 
> > consensus on it. I personally still think that it's a superior
> > solution.
> 
> To me the use of CBOR here feels not well motivated, except for obscure
> "better performance" reasons before having done any measurement to back
> that claim. From XMPP perspective, something in a Jingle XML stream
> would be more canonical (because it reuses the stack we already have in
> every XMPP client anyway) and anything diverting from that IMO should
> be well reasoned.
> 
> If you're reasoning that CBOR provides significant performance gain
> over XML, then why is it not a priority to figure out how we use CBOR
> instead of XML everywhere in XMPP (e.g. by creating some XML<>CBOR
> translation and using that as an optional stream feature).

While, if I had time and resources, yes I definitely think that CBOR or similar 
would be a good serialization protocol. Bet let's not go down this rabbit hole 
;)

> 
> > - regarding using RFB for input events only, I'll have a deeper look
> > at the 
> > spec and evaluate it. It may be an option it is comparable in ease of
> > implementation, efficiency and flexibility to the current proposal.
> 
> I want to repeat that I haven't verified that RFB is particularly good
> fit for the purpose, I just know it's very popular.

The idea is to check it. I want something flexible, easy to implement, and 
efficient. If RFB or whatever else checkes the boxes, why not.

> 
> Best,
> Marvin
> _______________________________________________
> Standards mailing list -- standards@xmpp.org
> To unsubscribe send an email to standards-le...@xmpp.org
>

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list -- standards@xmpp.org
To unsubscribe send an email to standards-le...@xmpp.org

[Standards] Re: Proposed XMPP Extension: Jingle Remote Control

Reply via email to