Re: why not flow control in wl_connection_flush?
On Fri, 1 Mar 2024 11:59:36 +0200 Pekka Paalanen wrote: > ... > The real problem here is though, how do you architect your app or > toolkit so that it can stop and postpone what it is doing with Wayland > when you detect that the socket is not draining fast enough? You might > be calling into Wayland using libraries that do not support this. > > Returning to the main event loop is the natural place to check and > postpone, but this whole issue stems from the reality that apps do not > return to their main loop often enough or do not know how to wait with > sending even in the main loop. I am concluding from this discussion that I don't think clients would be constructed not to cause problems if they attempt to send too fast. I think I may add an option to wl_connection_flush in my copy of libwayland so that I can turn on client waiting on flush from an env var. It looks like it the change would be pretty small. Unless you think it would be worth making this a MR on its own? If the client is single threaded, this will cause the whole client to wait, which probably won't be a problem, considering the type of clients that might try to be that fast. If the client isn't single threaded, then it may cause a thread to wait that the client doesn't expect to wait, which could be a problem for that client, admittedly.
Re: why not flow control in wl_connection_flush?
On Tue, 27 Feb 2024 11:01:18 +0200 Pekka Paalanen wrote: > > But suppose I'm writing a client that has the possibility of > > sending a rapid stream of msgs to the compositor that might be, > > depending on what other clients are doing, too fast for the > > compositor to handle, and I'd like to implement some flow control. > > I don't want the connection to the compositor to sever or for the > > condition to cause memory consumption without any ability for me to > > find out about and control the situation. Especially if I could > > slow down that rapid stream of updates without too high a cost to > > what my client is trying to accomplish. > > > > Is there a way I could do that? > > Get the Wayland fd with wl_display_get_fd() and poll it for writable. > If it's not writable, you're sending too fast. That's what I was looking for! I think... Maybe? > > That's what any client should always do. Usually it would be prompted > by wl_display_flush() erroring out with EAGAIN as your cue to start > polling for writable. It's even documented. But, calling wl_display_flush too often is bad for throughput, right? Isn't it better to allow the ring buffer to determine itself when to flush based on being full, and batch send many msgs? Obviously sometimes the client has nothing more to send (for a while), so wl_display_flush then makes sense. But in this case, it does have more to send and wants to know if it should attempt to do so or hold back. I could instead wait for the display fd to be writable before attempting each msg send. But the display fd may be writable merely because the ring buffer hasn't tried flushing yet. But the ring buffer could have less than enough space for the msg I'm about to send. And the socket buffer could have very little space - just enough for it to say its writable. Which means that sometimes polling the display fd will return writable when an attempt to send a msg is still going to result in ring buffer growth or client disconnect. So... back to calling wl_display_flush ... sometimes. I guess I could call wl_display_flush after what I think is about 4K worth of msg content. Wl_display_flush returns the amount sent, so that helps keep the extra state I need to maintain. Is there currently a way I could get the size of contents in the output ring buffer?
Re: why not flow control in wl_connection_flush?
On Mon, 26 Feb 2024 15:12:23 +0200 Pekka Paalanen wrote: ... > > What is the advantage to having the impacted clients grow their send > > buffers while waiting? They wait either way. > > They are not waiting if they are growing their send buffers. I meant that they must wait for the UI to update corresponding to the messages they are trying to send to the compositor. This may as also be about my assumption of a threading model, this time for the client. I assume that a client that has some important work to do that is unrelated to updating the display will do that work in a distinct thread from the one dedicated to sending display related msgs to the compositor. If that's not the case, then indeed causing the client's sending thread to wait could impact some other computation. Which might be bad, depending on what that other computation is trying to do. But suppose I'm writing a client that has the possibility of sending a rapid stream of msgs to the compositor that might be, depending on what other clients are doing, too fast for the compositor to handle, and I'd like to implement some flow control. I don't want the connection to the compositor to sever or for the condition to cause memory consumption without any ability for me to find out about and control the situation. Especially if I could slow down that rapid stream of updates without too high a cost to what my client is trying to accomplish. Is there a way I could do that?
Re: why not flow control in wl_connection_flush?
On Fri, 23 Feb 2024 12:12:38 +0200 Pekka Paalanen wrote: > I would think it to be quite difficult for a compositor to dedicate a > whole thread for each client. But that means it is possible that the server cannot catch up for long periods. And that just having a large number of otherwise friendly clients can cause their socket buffers to fill up. And things are worse on systems with more cores. What is the advantage to having the impacted clients grow their send buffers while waiting? They wait either way.
Re: why not flow control in wl_connection_flush?
Thanks for this response. I am considering adding unbounded buffering to my Wayland middleware project, and wanted to consider the flow control options first. Walking through the reasonsing here is very helpful. I didn't know that there was a built-in expectation that clients would do some of their own flow control. I was also operating under the assumption that blocking flushes from the compositor to one client would not have an impact on other clients (was assuming an appropriate threading model in compositors). The client OOM issue, though: A malicious client can do all kinds of things to try to get DoS, and moving towards OOM would accomplish that as well on systems with sufficient speed disadvantages for thrashing. A buggy client that isn't trying to do anything malicious, but is trapped in a send loop, that would be a case where causing it to wait might be better than allowing it to move towards OOM (and thrash). On Thu, 22 Feb 2024 11:52:28 +0200 Pekka Paalanen wrote: > On Wed, 21 Feb 2024 11:08:02 -0500 > jleivent wrote: > > > Not completely blocking makes sense for the compositor, but why not > > block the client? > > Blocking in clients is indeed less of a problem, but: > > - Clients do not usually have requests they *have to* send to the > compositor even if the compositor is not responding timely, unlike > input events that compositors have; a client can spam surfaces all > it wants, but it is just throwing work away if it does it faster than > the screen can update. So there is some built-in expectation that > clients control their sending. > > - I think the Wayland design wants to give full asynchronicity for > clients as well, never blocking them unless they explicitly choose > to wait for an event. A client might have semi-real-time > responsibilities as well. > > - A client's send buffer could be infinite. If a client chooses to > send requests so fast it hits OOM, it is just DoS'ing itself. > > > For the compositor, wouldn't a timeout in the sendmsg make sense? > > That would make both problems: slight blocking multiplied by number of > (stalled) clients, and overflows. That could lead to jittery user > experience while not eliminating the overflow problem. > > > Thanks, > pq >
Re: why not flow control in wl_connection_flush?
Not completely blocking makes sense for the compositor, but why not block the client? For the compositor, wouldn't a timeout in the sendmsg make sense? On Wed, 21 Feb 2024 16:39:08 +0100 Olivier Fourdan wrote: > Hi, > > On Wed, Feb 21, 2024 at 4:21 PM jleivent wrote: > > > I've been looking into some of the issues about allowing the > > socket's kernel buffer to run out of space, and was wondering why > > not simply remove MSG_DONTWAIT from the sendmsg call in > > wl_connection_flush? That should implement flow control by having > > the sender thread wait until the receiver has emptied the socket's > > buffer sufficiently. > > > > It seems to me that using an unbounded buffer could cause memory > > resource problems on whichever end was using that buffer. > > > > Was removing MSG_DONTWAIT from the sendmsg call considered and > > abandoned for some reason? > > > > See this thread [1] from 2012, it might give some hint on why > MSG_DONTWAIT was added with commit b26774da [2]. > > HTH > Olivier > > [1] > https://lists.freedesktop.org/archives/wayland-devel/2012-February/002394.html > [2] https://gitlab.freedesktop.org/wayland/wayland/-/commit/b26774da
why not flow control in wl_connection_flush?
I've been looking into some of the issues about allowing the socket's kernel buffer to run out of space, and was wondering why not simply remove MSG_DONTWAIT from the sendmsg call in wl_connection_flush? That should implement flow control by having the sender thread wait until the receiver has emptied the socket's buffer sufficiently. It seems to me that using an unbounded buffer could cause memory resource problems on whichever end was using that buffer. Was removing MSG_DONTWAIT from the sendmsg call considered and abandoned for some reason?
Re: protocol rules question: is an array arg of object ids legal?
Thanks for the prompt answer! On Wed, 27 Dec 2023 18:17:32 + Simon Ser wrote: > On Wednesday, December 27th, 2023 at 19:09, jleivent > wrote: > > > Is it legal for a protocol message to contain an array arg where the > > contents of the array are Wayland object ids? I don't see any > > instance of this in any current protocol descriptions I have. > > Technically nothing prevents this, but this will be pretty awkward > since client and server will need to convert to/from IDs (plus > wrapping/unwrapping the wl_proxy for the client) and there won't be > any type safety. In general it's better to have a request/event > carrying a single object which can be sent multiple times to > accumulate a list of objects.
protocol rules question: is an array arg of object ids legal?
Is it legal for a protocol message to contain an array arg where the contents of the array are Wayland object ids? I don't see any instance of this in any current protocol descriptions I have.
Re: aging merge request
Sorry about the typo! It should be: https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/339 On Sun, 24 Dec 2023 15:03:04 + Joshua Ashton wrote: > This gives a 404 for me. > > On December 19, 2023 8:22:21 PM GMT, jleivent > wrote: > >I submitted this merge request on October 8th: > > > >https://gitlab.freedesktop.org/wayland/wayland/-/merge_request/339 > > > >Is there any interest in it? > > > >Thanks, > >Jon > > - Joshie 🐸✨
aging merge request
I submitted this merge request on October 8th: https://gitlab.freedesktop.org/wayland/wayland/-/merge_request/339 Is there any interest in it? Thanks, Jon
Re: what are the protocol rules about uniqueness of event and request names?
On Fri, 8 Dec 2023 12:54:35 +0100 Sebastian Wick wrote: > ... > I think a more useful thing to do would be to add this restriction (an > interface cannot have an event and a request with the same name) to > the documentation and to wayland-scanner. > Also: an event and request with the same name would probably confuse anyone using WAYLAND_DEBUG. But: Would changing wayland-scanner to prevent this be backward compatible? Can't someone somewhere already have an event/request pair with the same name in their own private protocol extension?
Re: what are the protocol rules about uniqueness of event and request names?
On Thu, 7 Dec 2023 22:06:07 + David Edmundson wrote: > The generated C code be full of conflicts. The > MY_PROTOCOL_REQUESTEVENT_SINCE_VERSION define for a start. > > I think it might compile in C, but other generators exist that might > not and it's making life much more confusing than it needs to be. I > would strongly avoid it. > > David To be clear, I wasn't intending it to sound like I wanted to add an event and a request with the same name myself. I'm writing some middleware that sits between a Wayland compositor and some of its clients, and I would like to know if it might encounter an interface that has an event and a request with the same name. I think you've answered that it's not a good idea for a protocol author to do that, but it also sounds like it's a possibility that someone could do it anyway because there's no direct rule against it. So maybe I should take the necessary precautions. Thanks, Jonathan
what are the protocol rules about uniqueness of event and request names?
Can a single interface have an event and a request with the same name?
new test hangs in test-compositor.c at waitid - any clues?
I have a new test thatt is supposed to encounter an error in the server, causing the server to abort the client and end the test. The client is at that point in a sleep waiting to be aborted. Instead, the test hangs (and eventually times out). If I run it under gdb, and Ctrl-C break during the hang, I get: (gdb) bt #0 0x77e72ac6 in __waitid (idtype=P_PID, id=10135, infop=0x7fffdd70, options=4) at ../sysdeps/unix/sysv/linux/waitid.c:29 #1 0xde10 in handle_client_destroy (data=0x55567730) at ../tests/test-compositor.c:110 #2 0x77fa20fe in wl_event_loop_dispatch_idle (loop=0x55567440) at ../src/event-loop.c:969 #3 0x77fa256c in wl_event_loop_dispatch (loop=0x55567440, timeout=-1) at ../src/event-loop.c:1109 #4 0x77f9ea81 in wl_display_run (display=0x55567350) at ../src/wayland-server.c:1493 #5 0xe814 in display_run (d=0x55567300) at ../tests/test-compositor.c:401 #6 0xcc36 in server_needs_zombies () at ../tests/display-test.c:1884 #7 0xcf80 in run_test (t=0x555666e0 ) at ../tests/test-runner.c:159 #8 0xd559 in main (argc=2, argv=0x7fffe328) at ../tests/test-runner.c:345 [server_needs_zombies is the name of the new test, which I'm using to establish that the server needs zombie resources like the client needs zombie proxies] Using 'ps xf' I can see that the child client was not a zombie (in the linux process sense this time, not the wayland object sense) until the Ctrl-C in gdb, and then immediately becomes a zombie at the Ctrl-C. Continuing in gdb allows the test to terminate with the expected error result: Continuing. Client 'snz_client_loop' was killed by signal 2 Client 'snz_client_loop' failed 1 child(ren) failed In other words, for some reason, the abort signal sent to the client was not delivered until the server (parent process of the client) got interrupted itself. Has anyone else observed this inability of the test server to deliver the abort signal to its client until it is itself interrupted? Is there a bug in the test-compositor.c code (or maybe even wayland-server.c)? As a workaround, I had the client exit instead of sleep. But in that case the test passes even though the server encounters the expected error. Is there a way to configure the server such that if it encounters an error, it terminates the test as a failure?
Re: need help writing tests for specific event orderings
On Thu, 5 Oct 2023 13:28:57 +0300 Pekka Paalanen wrote: > ... > If you flush the Wayland connection explicitly, you should be able to > reliably avoid the deadlocks. Flushing is public stable API. Thanks! I will pattern these tests after the fd_passer display-test, since that is constructed to resemble an actual client-server configuration and interaction more closely than other tests. Also, following fd_passer's lead, they may not need any additional synchronization to force the issues.
Re: need help writing tests for specific event orderings
On Wed, 4 Oct 2023 11:26:02 +0300 Pekka Paalanen wrote: > ... > For the forked clients, there is stop_display()/display_resume(). > Maybe that helps? Maybe if I understand their usage correctly. Is this right?: A client would send a sequence of requests followed by a stop_display request. Anything the client sends after that stop_display request will not be processed in the server until the server issues a display_resume event. > ... > If you limit your direct marshalling to sequences that are > theoretically allowed, doesn't that already help you prove that all > those cases are handled correctly? Yes, as long as everyone believes in the "theoretically allowed" part. > ... > But I guess your goal is to see if using the API correctly could ever > trigger an illegal sequence? That's the goal. > ... > It's also possible to call both server and client APIs from the same > thread/process on the same Wayland connection, but you need to be > careful to prove it cannot deadlock. That should be much easier since > it's all single-threaded, and you just need to make sure the fds have > data to read and queues have messages to dispatch when you expect > them. I've been thinking about that. Deadlock is the issue, though. If my understanding of stop_display/display_resume is correct, I might use that. Thanks.
need help writing tests for specific event orderings
I am trying to write some tests that provoke errors in libwayland, but it doesn't seem to me like the existing test suite provides a mechanism to create specific event orderings that are allowed but not guaranteed by the asynchrony of the protocol. Is that correct? It looks to me like the tests in the test suite that involve a client and server all fork the client and allow it and server to run asynchronously without a way to impose any ordering restriction, but it's hard to tell. If there is a mechanism to use to get specific event orderings, where is it? I could simulate one side (the side that doesn't encounter the error) by directly marshalling the messages it would send into the wire to the other side. That might be a suitable unit test for after the error is proven to exist in the field, but it doesn't (conclusively) prove that the error can exist in the field because of its reliance on simulation tactics. That's my fallback - but is there a better way? Thanks, Jon
Re: Questions about object ID lifetimes
On Wed, 27 Sep 2023 11:47:37 +0300 Pekka Paalanen wrote: > .. > > You just need to tell meson where your build directory is, or cd into > it first. > > $ meson test -C build > > or > > $ cd build > $ meson test > Of course!
Re: Questions about object ID lifetimes
On Tue, 26 Sep 2023 11:53:07 +0300 Pekka Paalanen wrote: > On Mon, 25 Sep 2023 12:05:30 -0400 > jleivent wrote: > > > How do I get CI/CD capability turned on? I tried building the unit > > tests locally, but get errors that suggest those tests need to be > > run in CI. Issue 540 says I need to apply for the guest role - how > > do I do that? > > I don't recall libwayland having anything that needs to be > specifically run in a CI environment, and if it does, it should > automatically skip outside of CI environment. Weston does this. > > What errors did you get? How did you run them? > > 'meson test' is the command. I get: $ meson test ERROR: No such build data file as '/home/jil/gits/wayland-idfix/meson-private/build.dat'. I used this to build and install it: $ meson build/ --prefix=/home/jil/gits/wayland-idfix/install/ $ ninja -C build/ install Since that didn't create the needed meson-private/build.dat, I thought that might get put in by the CI somehow. > > I think applying for the guest role means that you can file an issue > on the upstream project asking for the permission. At minimum, a > maintainer needs to know your gitlab handle. I'll do that.
Re: Questions about object ID lifetimes
On Wed, 20 Sep 2023 10:05:51 -0400 jleivent wrote: > .. > Here's a very wild suggestion that would eliminate it and still > be compatible with Wayland 1. Add a delete_id request without > modifying the existing protocol. I have a delete_id request hack, enhanced zombies everywhere, a LRU ring for zombie reuse (when there's no delete_id requests) on the server, all with full compatibility maintained and no protocol additions (so it's fully drop-in compatible for clients and servers) building and running in my limited testing on my jonleivent/wayland-idfix fork. My README explains it in depth. I would like this to eventually become a pull request, but I need to do more testing first. Which brings me to my question: How do I get CI/CD capability turned on? I tried building the unit tests locally, but get errors that suggest those tests need to be run in CI. Issue 540 says I need to apply for the guest role - how do I do that? Thanks, Jon
Re: CI/CD privileges for wayland-idfix fork
On Sat, 23 Sep 2023 09:40:20 -0400 jleivent wrote: > Could I have CI/CD privileges for > https://gitlab.freedesktop.org/jonleivent/wayland-idfix > > Thanks > Jon Also: With respect to the caching scheme described in .gitlab-ci.yaml, should I change my FDO_DISTRIBUTION_TAG to stay out of the way? Anything else I need to do before CI is turned on?
CI/CD privileges for wayland-idfix fork
Could I have CI/CD privileges for https://gitlab.freedesktop.org/jonleivent/wayland-idfix Thanks Jon
Re: Questions about object ID lifetimes
On Wed, 20 Sep 2023 10:05:51 -0400 jleivent wrote: > ... > I'm considering forking libwayland and working on one or both of these > fixes for my own use, because I don't want to implement some even > crazier things in middleware to compensate for the server ID reuse > problem. > I keep getting "An error occurred while forking the project. Please try again." Am I locked out of forking wayland?
Re: Questions about object ID lifetimes
On Wed, 20 Sep 2023 11:30:19 +0300 Pekka Paalanen wrote: > .. > > This might help reduce those anomalous messages and be compatible > > with Wayland 1. Reduce the greediness of object ID reuse by: > > > > - not reusing any IDs unless at least some minimum number (256?) > > are free > > > > - reuse the freed ones in LRU fashion > > Yeah, the free list could be a FIFO instead of a LIFO. > > > There are other variations of this - the point of all being to > > increase the time between when any ID becomes free and when it is > > reused but without causing the ID maps to grow unreasonably large, > > or causing their maintenance to slow down. > > > > Increasing the time delay between freeing and reuse (such as with a > > higher minimum free threshold above) would probably lead to lower > > probability of anomalous messages. You could make this tunable > > through an environment variable. > > > > Note that the two sides don't have to agree to use this less-greedy > > ID allocation for either side to use it - and it's really only > > important for servers anyway. > > I'm wary of solutions that reduce the risk but do not eliminate it. If > a protocol interface design turns out racy, it would be best to find > that out sooner than later, and evaluate fixing it. Reproducibility > helps analysis. > Here's a very wild suggestion that would eliminate it and still be compatible with Wayland 1. Add a delete_id request without modifying the existing protocol. There are (at least) two pairs of ping/pong messages in the base protocol: xdg_wm_base and wl_shell_surface. From what I can tell (but I only have the wlroots code to look at), when the client sends a pong that doesn't correspond to the most recent ping, the server completely ignores it. Also, the serial arg used in pings starts low and is incremented. Also, the servers tend to reset the serial to 0 often. So it never increments very high (even if it never got reset, it's probably never going over 2^31-1). This means it's possible to use a specially crafted pong as a delete_id request. The client could send a pong with the highest bit on (so it won't accidentally match a real serial and ack a real ping) and the low bits indicating the object ID whose deletion it is acking. The server will, when it deletes one of its own objects, keep around at least the type (interface) until it gets this pong. There's two versions of this: one is that clients using patched libwayland libs send the pong on their own after seeing the server-side object deletion. Another is that a patched server sends a ping when it wants to reuse an ID to force a matching pong of an unpatched client (this one assumes a client won't queue a server-side object deletion and pong the ping before processing the deletion, hence still be able to send anomalous messages involving the deleted ID - so it's risky). If the server is patched to wait for these delete_id pongs, but the client is not, then at best the server could fall back to using a less greedy reuse as with my previous suggestion. A patched client could signal it is patched by sending an unsolicited specially crafted pong (serial arg = 0x) early on. It might be nice to give users the ability to start out with an unpatched libwayland, but if they think they are seeing clients getting killed off due to deleted server IDs in their requests, they could switch to using a patched "unauthorized" libwayland. It probably wouldn't be too hard to write a tool that parses WAYLAND_DEBUG output to see if an issue is due to delete server IDs. They'd use the patched libwayland at their own risk (but when isn't that the case?), understanding that the "fix" is a bit of a hack. I'm considering forking libwayland and working on one or both of these fixes for my own use, because I don't want to implement some even crazier things in middleware to compensate for the server ID reuse problem.
Re: Questions about object ID lifetimes
On Tue, 19 Sep 2023 10:02:55 -0400 jleivent wrote: > ... > This might help reduce those anomalous messages and be compatible with > Wayland 1. Reduce the greediness of object ID reuse by: > > - not reusing any IDs unless at least some minimum number (256?) > are free > > - reuse the freed ones in LRU fashion This also needs something like zombies on the server side. At least retain the type info for a free ID until it is reused.
Re: Questions about object ID lifetimes
On Tue, 19 Sep 2023 16:26:37 +0300 Pekka Paalanen wrote: > ... > > But aren't those fast frame updates done through shared fds? Hence > > not part of the wire protocol, and would not be impacted by > > increasing the length of messages on the wire? > > No. They are messages sent on the wire, telling "there is a new image > on that other fd I shared with you before, use that now", and so on. > That is usually a handful of requests per frame. Didn't realize that. > > I would argue that "speculative" is not the right word here, it was > never intended. How about: there are "anomalous" messages and state changes? > > tl;dr: protocol asynchrony leads to speculation that can result in > > the two sides disagreeing about the correct state of the world. > > We avoid that with careful protocol design in XML. There is exactly > that kind of situation in the xdg-family of extensions and it is > solved by sending a serial with the events and acking that serial > when the client acts on the events. > > It's a known caveat. OK. This might help reduce those anomalous messages and be compatible with Wayland 1. Reduce the greediness of object ID reuse by: - not reusing any IDs unless at least some minimum number (256?) are free - reuse the freed ones in LRU fashion There are other variations of this - the point of all being to increase the time between when any ID becomes free and when it is reused but without causing the ID maps to grow unreasonably large, or causing their maintenance to slow down. Increasing the time delay between freeing and reuse (such as with a higher minimum free threshold above) would probably lead to lower probability of anomalous messages. You could make this tunable through an environment variable. Note that the two sides don't have to agree to use this less-greedy ID allocation for either side to use it - and it's really only important for servers anyway.
Re: Questions about object ID lifetimes
On Mon, 18 Sep 2023 14:06:51 +0300 Pekka Paalanen wrote: > On Sat, 16 Sep 2023 12:18:35 -0400 > jleivent wrote: > > > The easiest fix I can think of is to go full-on half duplex. > > Meaning that each side doesn't send a single message until it has > > fully processed all messages sent to it in the order they arrive > > (thankfully, sockets preserve message order, else this would be > > much harder). Have you considered half duplex? > > Never crossed my mind at least. I can't even imagine how it could be > implemented through a socket, because both sides must be able to > spontaneously send a message at any time. By taking turns. Each side would, after queuing up a batch of messages, add an "Over!" message (from the days of half-duplex radio communications) to the end of that queue, and then send the whole queue (retaining its sequence). Neither side would send a message until it receives the other side's "Over!" message, and until the higher levels above libwayland have had a chance to examine all messages prior to "Over!" in order to avoid sending an inconsistent message or even committing to a state incompatible with later messages. > > > Certainly, it would mean a loss > > of some concurrency, hence a potential performance hit. But > > probably not that much in this case, as most of the message > > back-and-forth in Wayland occurs at user-interaction speeds, while > > the speed-needing stuff happens through fd sharing and similar > > things outside the protocol. I > > That user interaction speed can be in the order of a kilohertz, for > gaming mice, at least in one direction. In the other direction, > surface update rate is also unlimited, games may want to push out > frames even if only every tenth gets displayed to reduce latency. > Also truly tearing screen updates are being developed. But aren't those fast frame updates done through shared fds? Hence not part of the wire protocol, and would not be impacted by increasing the length of messages on the wire? > > > think it can be made mostly backward compatible. It would probably > > require some "all done" interaction between libwayland and higher > > levels on each side, but that's probably (hopefully) not too hard. > > There may even be a way to automate the "all done" interaction to > > make this fully backward compatible, because libwayland knows when > > there are no more messages to be processed on the wire, and it can > > queue-up the messages on each side before placing them on the wire. > > It might need to do things like re-order ping/pong messages with > > respect to the others to make sure the pinging side (compositor) > > doesn't declare the client dead while waiting. But that seems > > minor, as long as all such ping/pong pairs are opaque to the > > remainder of the protocol, hence always commute with other > > messages. > > If you mean adding new ping/pong stuff, that doesn't sound very nice, > because Wayland also aims to be power efficient: if truly nothing is > happening, let the processes sleep. Anyone could still wake up any > time, and send a message. Not adding. Dealing with the already existing (or if any new ones are added) ping/pong pairs. Or any messages that really need to be timely, hence can't wait for messages in front of them to be fully processed. That could apply to any real-time requirements, like the gaming mice messages you mentioned above. But doing this in general is hard unless the messages are irrelevant to the rest of the protocol (hence commute with everything else), like ping/pong are. > > > On Sun, 17 Sep 2023 15:28:04 -0400 > jleivent wrote: > > > Has altering the wire format to contain all the info needed for > > unambiguous decoding of each message entirely within libwayland > > without needing to know the object ID -> type mapping been > > considered? > > Not that I can recall. The wire format is ABI, libwayland is not the > only implementation of it, so that would be Wayland 2 material. So no changes to the wire format are possible under any circumstances in Wayland 1? > > > It would make the messages longer, but this seems like it wouldn't > > be very bad for performance because wire message transfer is roughly > > aligned with user interaction speeds. > > We need to be able to deal with at least a few thousand messages per > second easily. > > The overhead seems a bit bad if every message would need to carry its > signature. Encoding more into the message is only needed if there are no destructor request acks (the equivalent of wl_display::delete_id, but in the opposite
Re: Questions about object ID lifetimes
Has altering the wire format to contain all the info needed for unambiguous decoding of each message entirely within libwayland without needing to know the object ID -> type mapping been considered? It would make the messages longer, but this seems like it wouldn't be very bad for performance because wire message transfer is roughly aligned with user interaction speeds. Also, for any compositor/client pair, as long as they both use the same version of libwayland, the necessary wire format change would not result in compatibility issues. It would for static linked cases, or similar mismatching cases (flatpak, appimage, snap, etc. unless the host version is mapped in instead of the packaged one somehow). There also seem to be unused bits in the existing wire format so that one could detect an a compositor/client incompatibility at least on one end. I'm not suggesting that unambiguous decoding of all messages is a sufficient fix, but it is a necessary one. There are still speculative computation issues that it wouldn't resolve alone.
Re: Questions about object ID lifetimes
Pekka, After thinking more about what you said, I'm no longer optimistic. First, you are correct that my observation about opposite-side (side A-ranged ID vs. side B destructor) only works for middleware, and then only if the compositor and clients already handle their issues properly. Secondly, when thinking about the case of a message that arrives after an object has been deleted with new_ids in it, it occurs to me that this is a special case of a greater problem due to the existence of speculative computation as a result of the protocol's asynchrony. Any time there are at least two messages that don't commute with each other (and destruction is a case of a message that never commutes with any other message to the same object) where the two messages can be sent from opposite sides, at least one of them has to be undone somehow. And that undoing has to include state changes that preceeded it on its sending side that didn't take into account the other (non-undone) message. This is bad. It wouldn't be so bad if the protocol used some old-time mutexes or database read-vs-write transactional consistency preservation mechanisms. But those require quite a bit of input from higher levels (above libwayland). And there's deadlock to deal with. The easiest fix I can think of is to go full-on half duplex. Meaning that each side doesn't send a single message until it has fully processed all messages sent to it in the order they arrive (thankfully, sockets preserve message order, else this would be much harder). Have you considered half duplex? Certainly, it would mean a loss of some concurrency, hence a potential performance hit. But probably not that much in this case, as most of the message back-and-forth in Wayland occurs at user-interaction speeds, while the speed-needing stuff happens through fd sharing and similar things outside the protocol. I think it can be made mostly backward compatible. It would probably require some "all done" interaction between libwayland and higher levels on each side, but that's probably (hopefully) not too hard. There may even be a way to automate the "all done" interaction to make this fully backward compatible, because libwayland knows when there are no more messages to be processed on the wire, and it can queue-up the messages on each side before placing them on the wire. It might need to do things like re-order ping/pong messages with respect to the others to make sure the pinging side (compositor) doesn't declare the client dead while waiting. But that seems minor, as long as all such ping/pong pairs are opaque to the remainder of the protocol, hence always commute with other messages. As for my own middleware project, I think I will try to detect message decoding issues in all cases by keeping the most recent two types of each ID, and attempting to decode both ways (most recent first). There are fortunately a bunch of internal consistency checks that can be done, such as length of overall message vs. length of args vs. string length vs. null string termination, etc. But if the middleware gets a message that passes these decoding consistency checks for both of those types, then depending on what it is trying to do (as in one of my use cases, securing a sandboxed application), it may have to cut off the client.
Re: Questions about object ID lifetimes
On Thu, 14 Sep 2023 16:32:06 +0300 Pekka Paalanen wrote: > ... > > congratulations, I think you may have found everything that is not > quite right in the fundamental Wayland protocol design. :-) Oh, you flatter me. I'm sure there's more! > > As an aside, we collect unfixable issues under > https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next > These are issues that are either impossible or very difficult or > annoying to fix while keeping backward compatibility with both servers > and clients. Only 7 of them? > > -- > > Object ID re-use is what I would call "aggressive": in the libwayland > C implementation, the object ID last freed is the first one to be > allocated next. There are two separate allocation ranges each with its > own free list: server and client allocated IDs. After I sent the initial post, I realized that the two separate ID ranges help in the following way: For any object ID in the allocation range of side A, a destructor message from side B does not need acknowledgement. This is because B can't introduce a new object bound to that ID, only A can. Hence, any new_id arg for that ID is an acknowledgement of the destruction. However, B has to be careful to ignore messages containing that ID until it sees one with the ID as a new_id arg. After the destructor message from B but before a subsequent new_id for that ID from A, B should not use the ID as arguments to other messages (and attempts to do so can be dropped). And this can be automated provided the destructor tag can be relied on. > > The C implementation also poses an additional restriction: a new ID > can be at most the largest ever allocated ID + 1. > > All this is to keep the ID map as compact as possible without a hash > table. These details are in the implementation of the private 'struct > wl_map' in libwayland. Obviouly, that helps middleware as well, for the same reasons. It also makes more automatic error detection possible. > ... > > Your whole above analysis is completely correct! I was rather hoping things would turn out less complex than they seemed... > > > However, the other cases are not as easy to identify. > > > > The other cases are: > > 1. an object created by a client request that has destructor events > > 2. an object created by the compositor > > > > It might be true that case 1 does not exist. Is there a general > > rule against that such cases would never be considered in future > > expansions of the Wayland protocol? > > Destructor events do exist. Tagging them as such in the XML was not > done from the beginning though, it was added later in a > backward-compatible manner which makes the tag more informational than > something libwayland could automatically process. The foremost example > is wl_callback.done event. This is only safe because it is guaranteed > that the client cannot be sending a request on wl_callback at the same > time the server is sending 'done' and destroying the object: > wl_callback has no requests defined at all. Fortunately, my point above about the advantage of the separate ID ranges helps here. If wl_callback is created by the client, then a wl_callback.done event tagged as a destructor does not need acknowledgement AND is always safe provided that messages involving the wl_callback ID (other than it's eventual reuse as a new_id arg) are ignored above libwayland. But again, this means the destructor tag is important and not merely informational. I did notice that the destructor tagging was added mostly (or solely) to help with code generation by wayland-scanner implementations in programming languages where destructors require some specific syntactic notation. But maybe destructor tagging is even better than that? Maybe it would allow libwayland to automate more in a more robust way AND also allow for middleware that doesn't have to simulate all of the semantic level interactions induced by protocol messages in order to merely keep track of how to decode messages. > > It also requires that nothing passes an existing wl_callback object as > an argument in any request. We have been merely lucky that no-one has > done that. It's really hard to imagine a use case where you would want > to pass an existing wl_callback to anything. Again, the above separate ID ranges point addresses this, I think. > > Extensions may have similar objects that only deliver some one-off > events and then "self-destruct" by the final event. All this is simply > documented and not marked in the XML. That's what I was hoping to avoid. If there are object types where object lifetime can only be understood by simulating all of the relevant semantic content of the messages involved, then that's not good for middleware. Isn't it also problematic towards the goals of libwayland, because it makes it impossible for libwayland to ensure that messages are properly decoded without trusting that the client and/or compositor have implemen
Questions about object ID lifetimes
Forgive the long post. Tl;dr: what are the rules of object ID lifetime and reuse in the Wayland protocol? I am attempting to understand the rules of object ID lifetime within the Wayland protocol in order to construct Wayland middleware (similar to some of the tools featured on https://wayland.freedesktop.org/extras.html). I could not find a comprehensive discussion of the details online. If one exists, I would greatly appreciate a link! Middleware tools that wish to decode Wayland messages sent between the compositor and its clients need to maintain an accurate mapping between object ID and object interface (type). This is needed because the wire protocol's message header includes only the target object ID and an opcode that is relative to the object's type (the message header also includes the message length - about which I also have questions - to be pursued later...). The message (request or event) and its argument encoding can only be determined if the object ID -> type and type + opcode -> message mappings are accurately maintained. The type + opcode -> message mapping is static and can be extracted offline from the protocol XML files. Since object IDs can be reused, it is important for the middleware to understand when an ID can be reused and when it cannot be to avoid errors in the ID -> type mapping. Because the Wayland protocol is asynchronous, any message that implies destruction of an object should be acknowledged by the receiver before the destructed object's ID is reused. Fortunately, certain events and requests have been tagged as destructors in the protocol descriptions! Also fortunately, it appears (based on reading the wl_resource_destroy code in wayland-server.c) that for many object IDs, specifically for IDs of objects created by a client request (the ID appears as a new ID arg of a request, and is thus in the client side of the ID range) and for which the client makes a destructor request, the compositor will always send a wl_display::delete_id event (assuming the display_resource still exists for the client, which apparently would only not be the case after the client connection is severed) to acknowledge the destructor request. Any attempt to reuse that ID prior to the wl_display::delete_id event can lead to confusion, and should be avoided. Reuse of the ID after the wl_display::delete_id event should not result in any confusion. [BTW: for the purpose of this discussion, an object is "created" when it is introduced into a protocol message for the first time via a new_id argument. It does not refer to the actual allocation of the object in memory or to its initialization.] However, the other cases are not as easy to identify. The other cases are: 1. an object created by a client request that has destructor events 2. an object created by the compositor It might be true that case 1 does not exist. Is there a general rule against that such cases would never be considered in future expansions of the Wayland protocol? For objects created by the compositor, there are 2 subcases: 2a. objects with only destructor events 2b. objects with destructor requests Again, it might be the case that 2b does not exist, as it is analogous to case 1 above. But, is there a general rule against such future cases as well? Combining 1 and 2b, is there a general rule that says that only the object creator can initiate an object's destruction (unprovoked by the other side of the protocol)? For object IDs created by the compositor and with only destructor events (case 2a), it may be necessary to understand the details of each interface in question to decide when the ID can be reused, as there is no universal destructor acknowledgement request comparable to the wl_display::delete_id event. A requirement to understand the details to that level would make middleware development more difficult. Insert extreme sadness emoji here. Thankfully, it seems that destructor events are themselves acknowledgements of requests for destruction by the client (such as wp_drm_lease_device_v1::released event destructor vs. wp_drm_lease_device_v1::release request), or involve objects with a very limited lifetime and usage, such as callbacks (wp_presentation_feedback, zwp_linux_buffer_release, and zwp_fullscreen_shell_mode_feedback_v1). These limited lifetime/usage objects are created with the knowledge that all messages for them are destructor events, and that they are not involved in any other messages (as targets or arguments). Hence their destruction needs no further acknowledgement because the request for destruction was implied by their creation. The destructor event is the acknowledgement of that request. Is this a general rule: that a destructor event is is always the acknowledgement of a (perhaps implied) destruction request? So there may be two general simple rules that the middleware can follow to maintain a proper ID -> type mapping through ID reuse cycles: 1. reuse of ID is allowed after w