Hi Goffi, On Tue, 2024-05-21 at 12:47 +0200, Goffi wrote: > I know that, I've just ruled out using <message> through the server > as it has > been proposed in another feedback.
Why do you rule that out? Because you don't see a purpose, when my whole point is that I do see a purpose? Of course I can send whatever CBOR/JSON you come up with as a base64 blob inside a <message> for my usecase, but then I wonder why not to handle it in first place. > From a quick glance at the Wikipedia page, I see "In terms of > transferring > clipboard data, "there is currently no way to transfer text outside > the > Latin-1 character set".[5] A common pseudo-encoding extension solves > the > problem by using UTF-8 in an extended format.[2]: § 7.7.27 ", which > makes me > suspicious though. RFB definitely is old, so these kind of things are expected. And, while I see that you added clipboard as a potential future extension, it seems odd to complain that RFB has a suboptimal implementation of a feature your proposed XEP currently doesn't have at all. > One of the design goal of my proposal is to have something really > simple and > straightforward to implement. RFB isn't really hard to implement either. And ther are a ton of implementations out there already. > There is no modifier flag used in the specification. There is the key > value, and > the location number. From my tests, it's consistent and corresponds > to the > documentation for the browsers that I've tried (Firefox and > Chromium). I know that your specification doesn't transfer the modifier flags, probably assuming they are superfluous. However, if your browser client was to naively send the key events it receives as is without further checking for plausibility, things will go wrong: I tested pressing the keys that would logically result in the events meta down, control down, control up, meta up and here are the results on different browsers: https://imgur.com/a/zVxDAVa From what I understand, the state of keyup and keydown events in the web API doesn't need to be consistent (e.g. there can be keydown without keyup and vice-versa). Do we want the same behavior for this protocol or something else? > > > I'm not saying there aren't any cases where low-latency is > > important, > > where I disagree is that this is the case in all occasions. If you > > don't have low latency feedback from the remote device, low latency > > for > > input is very likely not crucial. > > I have the feeling that you only see this specification with the > remote desktop > use case point of view. There are other use cases, and one another > major one > is to use a device as input for another one in the same physical > location: use > of a smartphone as ad-hoc touch pad or gamepad for instance. And if > low > latency is easily achieved, I still don't see the point to have other > mechanism because in some niche case low latency is not that annoying > (but > still is, it's always annoying). I think you misunderstood my point. Using a smartphone as a touch pad or gamepad while playing a game on a screen next to you, is low latency feedback (you can see the screen with low latency). Example for where you don't need low latency would be when blindly typing into a remote shell, because you won't get feedback there (except after confirming a command which is probably not low latency). > > > > > Anyway, I remain not convinced that XSF is the place to specify a > > remote control protocol from scratch (which is what sections 8 and > > 9 of > > the XEP are about). Mostly because I feel the XSF does not have the > > competence for doing so (aka. we will probably do things terribly > > wrong, due to lack of experience in the field). > > Again, it is not from scratch. It's re-using existing protocols, in a > simple, > working, easy-to-implement, and efficient way. I was talking about the remote control protocol, which is what runs on the topmost layer (inside the webrtc datachannel or whatever other Jingle transport is used). This protocol is mostly from scratch (it's loosely based on web API events, but then only taking an arbitrarily picked subset of events and event properties) > The goal here is to be sure that it will work with web clients, as > data > channels are currently the only way to have direct connection with > browsers. I > can reformulate to only suggest it and get rid of the SHOULD. Which isn't an issue if web clients are not relevant for my usecase. And honestly, any kind of pointing to "you should support web clients" sounds weird to me. It certainly is interesting that we can support web clients, but really shouldn't siphon into unrelated specifications (and this one totally is unrelated to web). > WebRTC has sessions pretty much like Jingle; its ID is what you have > in the o= > line of your SDP. My point is: Either it's a Jingle session or it's not part of XMPP. Jingle doesn't use WebRTC. It just happens that WebRTC APIs are somewhat compatible to Jingle (because they are based on Jingle), but from XMPP perspective, you never have WebRTC sessions. I don't know exactly what it means to be in the same WebRTC session, but whatever you want here, make it more explicit, because people that don't use WebRTC APIs should not be required to first read the WebRTC specs (or probably implementations source code) to figure out what you mean by that. > The issue is that video feed is used in this case to get the screen > dimension. > Without it, we can't get touch event which use absolute position > (while for > mouse, there is a relative position mode for exactly this use case). That's a problematic design. As I said, clients might scale the video to reduce bandwidth use. Dino also has logic to adjust the video resolution of cameras depending on available bandwidth. And as I understood for mouse, it's not relative to the screen, but relative to the previous position, aka a movement vector, like reported from touchpads. An screen relative position that is 0,0 is upper left corner, 0.5,0.5 is center of the screen and 1,1 is lower right corner, would work independent of the target screen resolution. > An alternative would be to specify screen dimension when establishing > the > remote control session. Might work, but then you also need to cover the case where the screen resolution changes during remote control. > No, its value is in pixels, the same as for the Web API. Its double > because > pixels can be subdivided (High-DPI displays, transformations). I > realize that, > besides the link to MDN, this is not explicitly stated; I'll add a > notice in > future revisions. The Web API uses double because they did weird things for HiDPI. On the hardware layer, there are only pixels and if you click on a point on the screen, it will always be on a pixel (at least in all OS that I am aware of). The transformation of HiDPI in browsers abstract away from actual pixels and 1px might be more or less than a physical pixel. But why would you want to carry this abstraction through the network to a system that shouldn't care about what browsers can do and what they think a pixel is? > It was just to handle the case where no device is accepted, there was > 2 > options: > - reject it totally > - say it's a simple screen share session. > > I've chosen the later one. But indeed, data channel is then useless. > Can > change it for the other option. We also don't allow Jingle file transfers of no file or RTP contents without any codecs. As this protocol is for remote control, it should remain entirely unused for screen share only. > - I'm not hard set on technologies, and I'm OK to get rid of CBOR is > there is > consensus on it. I personally still think that it's a superior > solution. To me the use of CBOR here feels not well motivated, except for obscure "better performance" reasons before having done any measurement to back that claim. From XMPP perspective, something in a Jingle XML stream would be more canonical (because it reuses the stack we already have in every XMPP client anyway) and anything diverting from that IMO should be well reasoned. If you're reasoning that CBOR provides significant performance gain over XML, then why is it not a priority to figure out how we use CBOR instead of XML everywhere in XMPP (e.g. by creating some XML<>CBOR translation and using that as an optional stream feature). > - regarding using RFB for input events only, I'll have a deeper look > at the > spec and evaluate it. It may be an option it is comparable in ease of > implementation, efficiency and flexibility to the current proposal. I want to repeat that I haven't verified that RFB is particularly good fit for the purpose, I just know it's very popular. Best, Marvin _______________________________________________ Standards mailing list -- standards@xmpp.org To unsubscribe send an email to standards-le...@xmpp.org