Re: Questions about object ID lifetimes
On Wed, 27 Sep 2023 11:47:37 +0300 Pekka Paalanen wrote: > .. > > You just need to tell meson where your build directory is, or cd into > it first. > > $ meson test -C build > > or > > $ cd build > $ meson test > Of course!
Re: Questions about object ID lifetimes
On Tue, 26 Sep 2023 10:56:03 -0400 jleivent wrote: > On Tue, 26 Sep 2023 11:53:07 +0300 > Pekka Paalanen wrote: > > > On Mon, 25 Sep 2023 12:05:30 -0400 > > jleivent wrote: > > > > > How do I get CI/CD capability turned on? I tried building the unit > > > tests locally, but get errors that suggest those tests need to be > > > run in CI. Issue 540 says I need to apply for the guest role - how > > > do I do that? > > > > I don't recall libwayland having anything that needs to be > > specifically run in a CI environment, and if it does, it should > > automatically skip outside of CI environment. Weston does this. > > > > What errors did you get? How did you run them? > > > > 'meson test' is the command. > > I get: > > $ meson test > > ERROR: No such build data file as > '/home/jil/gits/wayland-idfix/meson-private/build.dat'. > > I used this to build and install it: > > $ meson build/ --prefix=/home/jil/gits/wayland-idfix/install/ > $ ninja -C build/ install > > Since that didn't create the needed meson-private/build.dat, I thought > that might get put in by the CI somehow. You just need to tell meson where your build directory is, or cd into it first. $ meson test -C build or $ cd build $ meson test Thanks, pq pgp5cDyjNzKXO.pgp Description: OpenPGP digital signature
Re: Questions about object ID lifetimes
On Tue, 26 Sep 2023 11:53:07 +0300 Pekka Paalanen wrote: > On Mon, 25 Sep 2023 12:05:30 -0400 > jleivent wrote: > > > How do I get CI/CD capability turned on? I tried building the unit > > tests locally, but get errors that suggest those tests need to be > > run in CI. Issue 540 says I need to apply for the guest role - how > > do I do that? > > I don't recall libwayland having anything that needs to be > specifically run in a CI environment, and if it does, it should > automatically skip outside of CI environment. Weston does this. > > What errors did you get? How did you run them? > > 'meson test' is the command. I get: $ meson test ERROR: No such build data file as '/home/jil/gits/wayland-idfix/meson-private/build.dat'. I used this to build and install it: $ meson build/ --prefix=/home/jil/gits/wayland-idfix/install/ $ ninja -C build/ install Since that didn't create the needed meson-private/build.dat, I thought that might get put in by the CI somehow. > > I think applying for the guest role means that you can file an issue > on the upstream project asking for the permission. At minimum, a > maintainer needs to know your gitlab handle. I'll do that.
Re: Questions about object ID lifetimes
On Mon, 25 Sep 2023 12:05:30 -0400 jleivent wrote: > How do I get CI/CD capability turned on? I tried building the unit > tests locally, but get errors that suggest those tests need to be run > in CI. Issue 540 says I need to apply for the guest role - how do I do > that? I don't recall libwayland having anything that needs to be specifically run in a CI environment, and if it does, it should automatically skip outside of CI environment. Weston does this. What errors did you get? How did you run them? 'meson test' is the command. I think applying for the guest role means that you can file an issue on the upstream project asking for the permission. At minimum, a maintainer needs to know your gitlab handle. All this permission hassle is just to avoid people that want to steal CPU time from CI runners for unrelated or unwanted purposes (like building a complete Android OS image from scratch or cryptomining). Thanks, pq pgprhB_7TXsgg.pgp Description: OpenPGP digital signature
Re: Questions about object ID lifetimes
On Wed, 20 Sep 2023 10:05:51 -0400 jleivent wrote: > .. > Here's a very wild suggestion that would eliminate it and still > be compatible with Wayland 1. Add a delete_id request without > modifying the existing protocol. I have a delete_id request hack, enhanced zombies everywhere, a LRU ring for zombie reuse (when there's no delete_id requests) on the server, all with full compatibility maintained and no protocol additions (so it's fully drop-in compatible for clients and servers) building and running in my limited testing on my jonleivent/wayland-idfix fork. My README explains it in depth. I would like this to eventually become a pull request, but I need to do more testing first. Which brings me to my question: How do I get CI/CD capability turned on? I tried building the unit tests locally, but get errors that suggest those tests need to be run in CI. Issue 540 says I need to apply for the guest role - how do I do that? Thanks, Jon
Re: Questions about object ID lifetimes
On 9/20/23 18:29, jleivent wrote: > On Wed, 20 Sep 2023 10:05:51 -0400 > jleivent wrote: > >> ... >> I'm considering forking libwayland and working on one or both of these >> fixes for my own use, because I don't want to implement some even >> crazier things in middleware to compensate for the server ID reuse >> problem. >> > > I keep getting "An error occurred while forking the project. Please try > again." > > Am I locked out of forking wayland? Has your account been verified per https://gitlab.freedesktop.org/freedesktop/freedesktop/-/wikis/home#warning-restrictions-due-to-spam-warning ? -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
Re: Questions about object ID lifetimes
On Wed, 20 Sep 2023 10:05:51 -0400 jleivent wrote: > ... > I'm considering forking libwayland and working on one or both of these > fixes for my own use, because I don't want to implement some even > crazier things in middleware to compensate for the server ID reuse > problem. > I keep getting "An error occurred while forking the project. Please try again." Am I locked out of forking wayland?
Re: Questions about object ID lifetimes
On Wed, 20 Sep 2023 11:30:19 +0300 Pekka Paalanen wrote: > .. > > This might help reduce those anomalous messages and be compatible > > with Wayland 1. Reduce the greediness of object ID reuse by: > > > > - not reusing any IDs unless at least some minimum number (256?) > > are free > > > > - reuse the freed ones in LRU fashion > > Yeah, the free list could be a FIFO instead of a LIFO. > > > There are other variations of this - the point of all being to > > increase the time between when any ID becomes free and when it is > > reused but without causing the ID maps to grow unreasonably large, > > or causing their maintenance to slow down. > > > > Increasing the time delay between freeing and reuse (such as with a > > higher minimum free threshold above) would probably lead to lower > > probability of anomalous messages. You could make this tunable > > through an environment variable. > > > > Note that the two sides don't have to agree to use this less-greedy > > ID allocation for either side to use it - and it's really only > > important for servers anyway. > > I'm wary of solutions that reduce the risk but do not eliminate it. If > a protocol interface design turns out racy, it would be best to find > that out sooner than later, and evaluate fixing it. Reproducibility > helps analysis. > Here's a very wild suggestion that would eliminate it and still be compatible with Wayland 1. Add a delete_id request without modifying the existing protocol. There are (at least) two pairs of ping/pong messages in the base protocol: xdg_wm_base and wl_shell_surface. From what I can tell (but I only have the wlroots code to look at), when the client sends a pong that doesn't correspond to the most recent ping, the server completely ignores it. Also, the serial arg used in pings starts low and is incremented. Also, the servers tend to reset the serial to 0 often. So it never increments very high (even if it never got reset, it's probably never going over 2^31-1). This means it's possible to use a specially crafted pong as a delete_id request. The client could send a pong with the highest bit on (so it won't accidentally match a real serial and ack a real ping) and the low bits indicating the object ID whose deletion it is acking. The server will, when it deletes one of its own objects, keep around at least the type (interface) until it gets this pong. There's two versions of this: one is that clients using patched libwayland libs send the pong on their own after seeing the server-side object deletion. Another is that a patched server sends a ping when it wants to reuse an ID to force a matching pong of an unpatched client (this one assumes a client won't queue a server-side object deletion and pong the ping before processing the deletion, hence still be able to send anomalous messages involving the deleted ID - so it's risky). If the server is patched to wait for these delete_id pongs, but the client is not, then at best the server could fall back to using a less greedy reuse as with my previous suggestion. A patched client could signal it is patched by sending an unsolicited specially crafted pong (serial arg = 0x) early on. It might be nice to give users the ability to start out with an unpatched libwayland, but if they think they are seeing clients getting killed off due to deleted server IDs in their requests, they could switch to using a patched "unauthorized" libwayland. It probably wouldn't be too hard to write a tool that parses WAYLAND_DEBUG output to see if an issue is due to delete server IDs. They'd use the patched libwayland at their own risk (but when isn't that the case?), understanding that the "fix" is a bit of a hack. I'm considering forking libwayland and working on one or both of these fixes for my own use, because I don't want to implement some even crazier things in middleware to compensate for the server ID reuse problem.
Re: Questions about object ID lifetimes
On Tue, 19 Sep 2023 10:02:55 -0400 jleivent wrote: > On Tue, 19 Sep 2023 16:26:37 +0300 > Pekka Paalanen wrote: > > > ... > > > But aren't those fast frame updates done through shared fds? Hence > > > not part of the wire protocol, and would not be impacted by > > > increasing the length of messages on the wire? > > > > No. They are messages sent on the wire, telling "there is a new image > > on that other fd I shared with you before, use that now", and so on. > > That is usually a handful of requests per frame. > > Didn't realize that. > > > > I would argue that "speculative" is not the right word here, it was > > never intended. > > How about: there are "anomalous" messages and state changes? I believe we tend to call them just race conditions, or racy messages. Btw. about grouping messages, we hand-roll protocol for that too: there is wl_surface.commit to latch a bunch of state, and the shell related extensions have the final 'configure' event, and it's common to have a 'done' event on an interface that splits sending state into multiple events. Several input interfaces have 'frame' event for the same. This helps the receiving side to wait for the complete state transmission before acting on it. It is inconvenient to have to design all this by hand every time in a new interface, and I agree it would be nice if the wire protocol foundation offered a solution somehow, but I'm not sure how that should look in the hypothetical Wayland 2. > > > tl;dr: protocol asynchrony leads to speculation that can result in > > > the two sides disagreeing about the correct state of the world. > > > > We avoid that with careful protocol design in XML. There is exactly > > that kind of situation in the xdg-family of extensions and it is > > solved by sending a serial with the events and acking that serial > > when the client acts on the events. > > > > It's a known caveat. > > OK. > > This might help reduce those anomalous messages and be compatible with > Wayland 1. Reduce the greediness of object ID reuse by: > > - not reusing any IDs unless at least some minimum number (256?) > are free > > - reuse the freed ones in LRU fashion Yeah, the free list could be a FIFO instead of a LIFO. > There are other variations of this - the point of all being to increase > the time between when any ID becomes free and when it is reused but > without causing the ID maps to grow unreasonably large, or causing their > maintenance to slow down. > > Increasing the time delay between freeing and reuse (such as with a > higher minimum free threshold above) would probably lead to lower > probability of anomalous messages. You could make this tunable through > an environment variable. > > Note that the two sides don't have to agree to use this less-greedy ID > allocation for either side to use it - and it's really only important > for servers anyway. I'm wary of solutions that reduce the risk but do not eliminate it. If a protocol interface design turns out racy, it would be best to find that out sooner than later, and evaluate fixing it. Reproducibility helps analysis. Thanks, pq pgpAKb_XmKIC0.pgp Description: OpenPGP digital signature
Re: Questions about object ID lifetimes
On Tue, 19 Sep 2023 10:02:55 -0400 jleivent wrote: > ... > This might help reduce those anomalous messages and be compatible with > Wayland 1. Reduce the greediness of object ID reuse by: > > - not reusing any IDs unless at least some minimum number (256?) > are free > > - reuse the freed ones in LRU fashion This also needs something like zombies on the server side. At least retain the type info for a free ID until it is reused.
Re: Questions about object ID lifetimes
On Tue, 19 Sep 2023 16:26:37 +0300 Pekka Paalanen wrote: > ... > > But aren't those fast frame updates done through shared fds? Hence > > not part of the wire protocol, and would not be impacted by > > increasing the length of messages on the wire? > > No. They are messages sent on the wire, telling "there is a new image > on that other fd I shared with you before, use that now", and so on. > That is usually a handful of requests per frame. Didn't realize that. > > I would argue that "speculative" is not the right word here, it was > never intended. How about: there are "anomalous" messages and state changes? > > tl;dr: protocol asynchrony leads to speculation that can result in > > the two sides disagreeing about the correct state of the world. > > We avoid that with careful protocol design in XML. There is exactly > that kind of situation in the xdg-family of extensions and it is > solved by sending a serial with the events and acking that serial > when the client acts on the events. > > It's a known caveat. OK. This might help reduce those anomalous messages and be compatible with Wayland 1. Reduce the greediness of object ID reuse by: - not reusing any IDs unless at least some minimum number (256?) are free - reuse the freed ones in LRU fashion There are other variations of this - the point of all being to increase the time between when any ID becomes free and when it is reused but without causing the ID maps to grow unreasonably large, or causing their maintenance to slow down. Increasing the time delay between freeing and reuse (such as with a higher minimum free threshold above) would probably lead to lower probability of anomalous messages. You could make this tunable through an environment variable. Note that the two sides don't have to agree to use this less-greedy ID allocation for either side to use it - and it's really only important for servers anyway.
Re: Questions about object ID lifetimes
On Mon, 18 Sep 2023 11:31:18 -0400 jleivent wrote: > On Mon, 18 Sep 2023 14:06:51 +0300 > Pekka Paalanen wrote: > > > On Sat, 16 Sep 2023 12:18:35 -0400 > > jleivent wrote: > > > > > The easiest fix I can think of is to go full-on half duplex. > > > Meaning that each side doesn't send a single message until it has > > > fully processed all messages sent to it in the order they arrive > > > (thankfully, sockets preserve message order, else this would be > > > much harder). Have you considered half duplex? > > > > Never crossed my mind at least. I can't even imagine how it could be > > implemented through a socket, because both sides must be able to > > spontaneously send a message at any time. > > By taking turns. Each side would, after queuing up a batch of > messages, add an "Over!" message (from the days of half-duplex > radio communications) to the end of that queue, and then send the whole > queue (retaining its sequence). Neither side would send a message > until it receives the other side's "Over!" message, and until the > higher levels above libwayland have had a chance to examine all > messages prior to "Over!" in order to avoid sending an inconsistent > message or even committing to a state incompatible with later messages. > > > > > > Certainly, it would mean a loss > > > of some concurrency, hence a potential performance hit. But > > > probably not that much in this case, as most of the message > > > back-and-forth in Wayland occurs at user-interaction speeds, while > > > the speed-needing stuff happens through fd sharing and similar > > > things outside the protocol. I > > > > That user interaction speed can be in the order of a kilohertz, for > > gaming mice, at least in one direction. In the other direction, > > surface update rate is also unlimited, games may want to push out > > frames even if only every tenth gets displayed to reduce latency. > > Also truly tearing screen updates are being developed. > > But aren't those fast frame updates done through shared fds? Hence not > part of the wire protocol, and would not be impacted by increasing the > length of messages on the wire? No. They are messages sent on the wire, telling "there is a new image on that other fd I shared with you before, use that now", and so on. That is usually a handful of requests per frame. Likewise, every pointer motion event is one or multiple wire events. Shared fds are used for sharing big chunks of data mostly, that is, shared memory. But we don't use shared memory messaging nor locks. All messaging is Wayland messages over the socket. After all, the XML files describe wire messages. We want everything to be in the same protocol stream as much as possible to reduce race possibilities. If we had shared memory messaging in addition to the unix socket, that would be two mutually async protocol streams in the same direction. That would be quite a pain, as we've learnt from Xwayland (you have both Wayland and X11 connections between the same two entities; as another matter, libX11 is really eager to have blocking roundtrips, so if libwayland would also block for something, a deadlock is practically guaranteed eventually). > > > > > think it can be made mostly backward compatible. It would probably > > > require some "all done" interaction between libwayland and higher > > > levels on each side, but that's probably (hopefully) not too hard. > > > There may even be a way to automate the "all done" interaction to > > > make this fully backward compatible, because libwayland knows when > > > there are no more messages to be processed on the wire, and it can > > > queue-up the messages on each side before placing them on the wire. > > > It might need to do things like re-order ping/pong messages with > > > respect to the others to make sure the pinging side (compositor) > > > doesn't declare the client dead while waiting. But that seems > > > minor, as long as all such ping/pong pairs are opaque to the > > > remainder of the protocol, hence always commute with other > > > messages. > > > > If you mean adding new ping/pong stuff, that doesn't sound very nice, > > because Wayland also aims to be power efficient: if truly nothing is > > happening, let the processes sleep. Anyone could still wake up any > > time, and send a message. > > Not adding. Dealing with the already existing (or if any new ones are > added) ping/pong pairs. Or any messages that really need to be timely, > hence can't wait for messages in front of them to be fully processed. There are no existing mandatory ping/pong messages. Some extensions have some, but all extensions are by definition optional from the libwayland point of view. Wayland messages are strictly ordered per direction, there is zero expectation or guarantee that anything could be re-ordered at libwayland level. > That could apply to any real-time requirements, like the gaming mice > messages you mentioned above. But doing this in gen
Re: Questions about object ID lifetimes
On Mon, 18 Sep 2023 14:06:51 +0300 Pekka Paalanen wrote: > On Sat, 16 Sep 2023 12:18:35 -0400 > jleivent wrote: > > > The easiest fix I can think of is to go full-on half duplex. > > Meaning that each side doesn't send a single message until it has > > fully processed all messages sent to it in the order they arrive > > (thankfully, sockets preserve message order, else this would be > > much harder). Have you considered half duplex? > > Never crossed my mind at least. I can't even imagine how it could be > implemented through a socket, because both sides must be able to > spontaneously send a message at any time. By taking turns. Each side would, after queuing up a batch of messages, add an "Over!" message (from the days of half-duplex radio communications) to the end of that queue, and then send the whole queue (retaining its sequence). Neither side would send a message until it receives the other side's "Over!" message, and until the higher levels above libwayland have had a chance to examine all messages prior to "Over!" in order to avoid sending an inconsistent message or even committing to a state incompatible with later messages. > > > Certainly, it would mean a loss > > of some concurrency, hence a potential performance hit. But > > probably not that much in this case, as most of the message > > back-and-forth in Wayland occurs at user-interaction speeds, while > > the speed-needing stuff happens through fd sharing and similar > > things outside the protocol. I > > That user interaction speed can be in the order of a kilohertz, for > gaming mice, at least in one direction. In the other direction, > surface update rate is also unlimited, games may want to push out > frames even if only every tenth gets displayed to reduce latency. > Also truly tearing screen updates are being developed. But aren't those fast frame updates done through shared fds? Hence not part of the wire protocol, and would not be impacted by increasing the length of messages on the wire? > > > think it can be made mostly backward compatible. It would probably > > require some "all done" interaction between libwayland and higher > > levels on each side, but that's probably (hopefully) not too hard. > > There may even be a way to automate the "all done" interaction to > > make this fully backward compatible, because libwayland knows when > > there are no more messages to be processed on the wire, and it can > > queue-up the messages on each side before placing them on the wire. > > It might need to do things like re-order ping/pong messages with > > respect to the others to make sure the pinging side (compositor) > > doesn't declare the client dead while waiting. But that seems > > minor, as long as all such ping/pong pairs are opaque to the > > remainder of the protocol, hence always commute with other > > messages. > > If you mean adding new ping/pong stuff, that doesn't sound very nice, > because Wayland also aims to be power efficient: if truly nothing is > happening, let the processes sleep. Anyone could still wake up any > time, and send a message. Not adding. Dealing with the already existing (or if any new ones are added) ping/pong pairs. Or any messages that really need to be timely, hence can't wait for messages in front of them to be fully processed. That could apply to any real-time requirements, like the gaming mice messages you mentioned above. But doing this in general is hard unless the messages are irrelevant to the rest of the protocol (hence commute with everything else), like ping/pong are. > > > On Sun, 17 Sep 2023 15:28:04 -0400 > jleivent wrote: > > > Has altering the wire format to contain all the info needed for > > unambiguous decoding of each message entirely within libwayland > > without needing to know the object ID -> type mapping been > > considered? > > Not that I can recall. The wire format is ABI, libwayland is not the > only implementation of it, so that would be Wayland 2 material. So no changes to the wire format are possible under any circumstances in Wayland 1? > > > It would make the messages longer, but this seems like it wouldn't > > be very bad for performance because wire message transfer is roughly > > aligned with user interaction speeds. > > We need to be able to deal with at least a few thousand messages per > second easily. > > The overhead seems a bit bad if every message would need to carry its > signature. Encoding more into the message is only needed if there are no destructor request acks (the equivalent of wl_display::delete_id, but in the opposite direction). But I was wondering why not do it for robustness. The signature isn't very big, but it's probably not needed even for robustness. What's needed is the target object type/version information. Since from that both sides know the signature. The issue is just how to add robustness to the object ID -> type/version mapping, which is the source of many problems. The signatures ar
Re: Questions about object ID lifetimes
On Sat, 16 Sep 2023 12:18:35 -0400 jleivent wrote: > The easiest fix I can think of is to go full-on half duplex. Meaning > that each side doesn't send a single message until it has fully > processed all messages sent to it in the order they arrive (thankfully, > sockets preserve message order, else this would be much harder). > Have you considered half duplex? Never crossed my mind at least. I can't even imagine how it could be implemented through a socket, because both sides must be able to spontaneously send a message at any time. > Certainly, it would mean a loss > of some concurrency, hence a potential performance hit. But probably > not that much in this case, as most of the message back-and-forth in > Wayland occurs at user-interaction speeds, while the speed-needing stuff > happens through fd sharing and similar things outside the protocol. I That user interaction speed can be in the order of a kilohertz, for gaming mice, at least in one direction. In the other direction, surface update rate is also unlimited, games may want to push out frames even if only every tenth gets displayed to reduce latency. Also truly tearing screen updates are being developed. > think it can be made mostly backward compatible. It would probably > require some "all done" interaction between libwayland and higher > levels on each side, but that's probably (hopefully) not too hard. > There may even be a way to automate the "all done" interaction to make > this fully backward compatible, because libwayland knows when there are > no more messages to be processed on the wire, and it can queue-up the > messages on each side before placing them on the wire. It might need > to do things like re-order ping/pong messages with respect to the > others to make sure the pinging side (compositor) doesn't declare the > client dead while waiting. But that seems minor, as long as all such > ping/pong pairs are opaque to the remainder of the protocol, hence > always commute with other messages. If you mean adding new ping/pong stuff, that doesn't sound very nice, because Wayland also aims to be power efficient: if truly nothing is happening, let the processes sleep. Anyone could still wake up any time, and send a message. On Sun, 17 Sep 2023 15:28:04 -0400 jleivent wrote: > Has altering the wire format to contain all the info needed for > unambiguous decoding of each message entirely within libwayland without > needing to know the object ID -> type mapping been considered? Not that I can recall. The wire format is ABI, libwayland is not the only implementation of it, so that would be Wayland 2 material. > It would make the messages longer, but this seems like it wouldn't be > very bad for performance because wire message transfer is roughly > aligned with user interaction speeds. We need to be able to deal with at least a few thousand messages per second easily. The overhead seems a bit bad if every message would need to carry its signature. > Also, for any compositor/client pair, as long as they both use the same > version of libwayland, the necessary wire format change would not > result in compatibility issues. It would for static linked cases, > or similar mismatching cases (flatpak, appimage, snap, etc. unless > the host version is mapped in instead of the packaged one somehow). > There also seem to be unused bits in the existing wire format so that > one could detect an a compositor/client incompatibility at least on one > end. We've never had the requirement for compositor and clients to use the same minor version of libwayland. There are also completely independent Wayland implementations in other languages that expect to be interoperable. Breaking all that seems unacceptable. What unused bits did you find? > I'm not suggesting that unambiguous decoding of all messages is a > sufficient fix, but it is a necessary one. There are still speculative > computation issues that it wouldn't resolve alone. I didn't understand what is speculative. There is no roll-back of any kind on anything, what's computed is final. Thanks, pq pgpd7U7FgFN0Q.pgp Description: OpenPGP digital signature
Re: Questions about object ID lifetimes
Has altering the wire format to contain all the info needed for unambiguous decoding of each message entirely within libwayland without needing to know the object ID -> type mapping been considered? It would make the messages longer, but this seems like it wouldn't be very bad for performance because wire message transfer is roughly aligned with user interaction speeds. Also, for any compositor/client pair, as long as they both use the same version of libwayland, the necessary wire format change would not result in compatibility issues. It would for static linked cases, or similar mismatching cases (flatpak, appimage, snap, etc. unless the host version is mapped in instead of the packaged one somehow). There also seem to be unused bits in the existing wire format so that one could detect an a compositor/client incompatibility at least on one end. I'm not suggesting that unambiguous decoding of all messages is a sufficient fix, but it is a necessary one. There are still speculative computation issues that it wouldn't resolve alone.
Re: Questions about object ID lifetimes
Pekka, After thinking more about what you said, I'm no longer optimistic. First, you are correct that my observation about opposite-side (side A-ranged ID vs. side B destructor) only works for middleware, and then only if the compositor and clients already handle their issues properly. Secondly, when thinking about the case of a message that arrives after an object has been deleted with new_ids in it, it occurs to me that this is a special case of a greater problem due to the existence of speculative computation as a result of the protocol's asynchrony. Any time there are at least two messages that don't commute with each other (and destruction is a case of a message that never commutes with any other message to the same object) where the two messages can be sent from opposite sides, at least one of them has to be undone somehow. And that undoing has to include state changes that preceeded it on its sending side that didn't take into account the other (non-undone) message. This is bad. It wouldn't be so bad if the protocol used some old-time mutexes or database read-vs-write transactional consistency preservation mechanisms. But those require quite a bit of input from higher levels (above libwayland). And there's deadlock to deal with. The easiest fix I can think of is to go full-on half duplex. Meaning that each side doesn't send a single message until it has fully processed all messages sent to it in the order they arrive (thankfully, sockets preserve message order, else this would be much harder). Have you considered half duplex? Certainly, it would mean a loss of some concurrency, hence a potential performance hit. But probably not that much in this case, as most of the message back-and-forth in Wayland occurs at user-interaction speeds, while the speed-needing stuff happens through fd sharing and similar things outside the protocol. I think it can be made mostly backward compatible. It would probably require some "all done" interaction between libwayland and higher levels on each side, but that's probably (hopefully) not too hard. There may even be a way to automate the "all done" interaction to make this fully backward compatible, because libwayland knows when there are no more messages to be processed on the wire, and it can queue-up the messages on each side before placing them on the wire. It might need to do things like re-order ping/pong messages with respect to the others to make sure the pinging side (compositor) doesn't declare the client dead while waiting. But that seems minor, as long as all such ping/pong pairs are opaque to the remainder of the protocol, hence always commute with other messages. As for my own middleware project, I think I will try to detect message decoding issues in all cases by keeping the most recent two types of each ID, and attempting to decode both ways (most recent first). There are fortunately a bunch of internal consistency checks that can be done, such as length of overall message vs. length of args vs. string length vs. null string termination, etc. But if the middleware gets a message that passes these decoding consistency checks for both of those types, then depending on what it is trying to do (as in one of my use cases, securing a sandboxed application), it may have to cut off the client.
Re: Questions about object ID lifetimes
On Thu, 14 Sep 2023 15:10:48 -0400 jleivent wrote: > On Thu, 14 Sep 2023 16:32:06 +0300 > Pekka Paalanen wrote: > > > > > As an aside, we collect unfixable issues under > > https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next > > These are issues that are either impossible or very difficult or > > annoying to fix while keeping backward compatibility with both servers > > and clients. > > Only 7 of them? Some of them are bags of issues. Feel free to collect your ideas in new issues though. You may find interested people. > > -- > > > > Object ID re-use is what I would call "aggressive": in the libwayland > > C implementation, the object ID last freed is the first one to be > > allocated next. There are two separate allocation ranges each with its > > own free list: server and client allocated IDs. > > After I sent the initial post, I realized that the two separate > ID ranges help in the following way: > > For any object ID in the allocation range of side A, a destructor > message from side B does not need acknowledgement. This is because B > can't introduce a new object bound to that ID, only A can. Hence, any > new_id arg for that ID is an acknowledgement of the destruction. That's an interesting interpretation. Perhaps it works for your case, but I would not use in regular clients and servers, because I'd like to be able to catch ID re-use errors like libwayland does. > However, B has to be careful to ignore messages containing that ID > until it sees one with the ID as a new_id arg. After the destructor > message from B but before a subsequent new_id for that ID from A, B > should not use the ID as arguments to other messages (and attempts to > do so can be dropped). And this can be automated provided the > destructor tag can be relied on. > > > > > The C implementation also poses an additional restriction: a new ID > > can be at most the largest ever allocated ID + 1. > > > > All this is to keep the ID map as compact as possible without a hash > > table. These details are in the implementation of the private 'struct > > wl_map' in libwayland. > > Obviouly, that helps middleware as well, for the same reasons. It also > makes more automatic error detection possible. > > > ... > > > > Your whole above analysis is completely correct! > > I was rather hoping things would turn out less complex than they > seemed... > > > > > > However, the other cases are not as easy to identify. > > > > > > The other cases are: > > > 1. an object created by a client request that has destructor events > > > 2. an object created by the compositor > > > > > > It might be true that case 1 does not exist. Is there a general > > > rule against that such cases would never be considered in future > > > expansions of the Wayland protocol? > > > > Destructor events do exist. Tagging them as such in the XML was not > > done from the beginning though, it was added later in a > > backward-compatible manner which makes the tag more informational than > > something libwayland could automatically process. The foremost example > > is wl_callback.done event. This is only safe because it is guaranteed > > that the client cannot be sending a request on wl_callback at the same > > time the server is sending 'done' and destroying the object: > > wl_callback has no requests defined at all. > > Fortunately, my point above about the advantage of the separate ID > ranges helps here. If wl_callback is created by the client, then a > wl_callback.done event tagged as a destructor does not need > acknowledgement AND is always safe provided that messages involving the > wl_callback ID (other than it's eventual reuse as a new_id arg) are > ignored above libwayland. I think there is another asymmetry here. libwayland-client definitely ignores events on an object ID whose wl_proxy has been destroyed from the API user point of view. libwayland-server though seems to be throwing a protocol error immediately on any non-existing object ID receiving a message. I think there is a case that requires the delete_id event and cannot work solely on the new_id ID re-use rule: - client sends request to create wl_callback - client destroys the wl_callback (no request, enters zombie state client side) - client creates some other new object X When the server destroys the wl_callback, it sends the done event, and delete_id event, and cannot know if the client has already destroyed the wl_callback or not. Zombie IDs are not eligible for re-use. The zombie is destroyed and the ID freed on delete_id event. If the client has not seen the delete_id yet, it allocates a new high ID for object X. Otherwise, it re-uses the wl_callback's old ID for X. However, there is no delete_id in the opposite direction, meaning that a similar situation in the opposite direction is racy and can lead to confusing object IDs. I guess this is what you already found out. Zombies are actually what allows lib
Re: Questions about object ID lifetimes
On Thu, 14 Sep 2023 16:32:06 +0300 Pekka Paalanen wrote: > ... > > congratulations, I think you may have found everything that is not > quite right in the fundamental Wayland protocol design. :-) Oh, you flatter me. I'm sure there's more! > > As an aside, we collect unfixable issues under > https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next > These are issues that are either impossible or very difficult or > annoying to fix while keeping backward compatibility with both servers > and clients. Only 7 of them? > > -- > > Object ID re-use is what I would call "aggressive": in the libwayland > C implementation, the object ID last freed is the first one to be > allocated next. There are two separate allocation ranges each with its > own free list: server and client allocated IDs. After I sent the initial post, I realized that the two separate ID ranges help in the following way: For any object ID in the allocation range of side A, a destructor message from side B does not need acknowledgement. This is because B can't introduce a new object bound to that ID, only A can. Hence, any new_id arg for that ID is an acknowledgement of the destruction. However, B has to be careful to ignore messages containing that ID until it sees one with the ID as a new_id arg. After the destructor message from B but before a subsequent new_id for that ID from A, B should not use the ID as arguments to other messages (and attempts to do so can be dropped). And this can be automated provided the destructor tag can be relied on. > > The C implementation also poses an additional restriction: a new ID > can be at most the largest ever allocated ID + 1. > > All this is to keep the ID map as compact as possible without a hash > table. These details are in the implementation of the private 'struct > wl_map' in libwayland. Obviouly, that helps middleware as well, for the same reasons. It also makes more automatic error detection possible. > ... > > Your whole above analysis is completely correct! I was rather hoping things would turn out less complex than they seemed... > > > However, the other cases are not as easy to identify. > > > > The other cases are: > > 1. an object created by a client request that has destructor events > > 2. an object created by the compositor > > > > It might be true that case 1 does not exist. Is there a general > > rule against that such cases would never be considered in future > > expansions of the Wayland protocol? > > Destructor events do exist. Tagging them as such in the XML was not > done from the beginning though, it was added later in a > backward-compatible manner which makes the tag more informational than > something libwayland could automatically process. The foremost example > is wl_callback.done event. This is only safe because it is guaranteed > that the client cannot be sending a request on wl_callback at the same > time the server is sending 'done' and destroying the object: > wl_callback has no requests defined at all. Fortunately, my point above about the advantage of the separate ID ranges helps here. If wl_callback is created by the client, then a wl_callback.done event tagged as a destructor does not need acknowledgement AND is always safe provided that messages involving the wl_callback ID (other than it's eventual reuse as a new_id arg) are ignored above libwayland. But again, this means the destructor tag is important and not merely informational. I did notice that the destructor tagging was added mostly (or solely) to help with code generation by wayland-scanner implementations in programming languages where destructors require some specific syntactic notation. But maybe destructor tagging is even better than that? Maybe it would allow libwayland to automate more in a more robust way AND also allow for middleware that doesn't have to simulate all of the semantic level interactions induced by protocol messages in order to merely keep track of how to decode messages. > > It also requires that nothing passes an existing wl_callback object as > an argument in any request. We have been merely lucky that no-one has > done that. It's really hard to imagine a use case where you would want > to pass an existing wl_callback to anything. Again, the above separate ID ranges point addresses this, I think. > > Extensions may have similar objects that only deliver some one-off > events and then "self-destruct" by the final event. All this is simply > documented and not marked in the XML. That's what I was hoping to avoid. If there are object types where object lifetime can only be understood by simulating all of the relevant semantic content of the messages involved, then that's not good for middleware. Isn't it also problematic towards the goals of libwayland, because it makes it impossible for libwayland to ensure that messages are properly decoded without trusting that the client and/or compositor have implemen
Re: Questions about object ID lifetimes
On Wed, 13 Sep 2023 20:16:09 -0400 jleivent wrote: > Forgive the long post. Tl;dr: what are the rules of object ID lifetime > and reuse in the Wayland protocol? Hi, congratulations, I think you may have found everything that is not quite right in the fundamental Wayland protocol design. :-) As an aside, we collect unfixable issues under https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next These are issues that are either impossible or very difficult or annoying to fix while keeping backward compatibility with both servers and clients. -- Object ID re-use is what I would call "aggressive": in the libwayland C implementation, the object ID last freed is the first one to be allocated next. There are two separate allocation ranges each with its own free list: server and client allocated IDs. The C implementation also poses an additional restriction: a new ID can be at most the largest ever allocated ID + 1. All this is to keep the ID map as compact as possible without a hash table. These details are in the implementation of the private 'struct wl_map' in libwayland. > I am attempting to understand the rules of object ID lifetime within > the Wayland protocol in order to construct Wayland middleware (similar > to some of the tools featured on > https://wayland.freedesktop.org/extras.html). I could not find a > comprehensive discussion of the details online. If one exists, I would > greatly appreciate a link! > > Middleware tools that wish to decode Wayland messages sent between the > compositor and its clients need to maintain an accurate mapping between > object ID and object interface (type). This is needed because the wire > protocol's message header includes only the target object ID and an > opcode that is relative to the object's type (the message header also > includes the message length - about which I also have questions - to be > pursued later...). The message (request or event) and its argument > encoding can only be determined if the object ID -> type and type + > opcode -> message mappings are accurately maintained. The type + > opcode -> message mapping is static and can be extracted offline from > the protocol XML files. > > Since object IDs can be reused, it is important for the middleware to > understand when an ID can be reused and when it cannot be to avoid > errors in the ID -> type mapping. > > Because the Wayland protocol is asynchronous, any message that implies > destruction of an object should be acknowledged by the receiver before > the destructed object's ID is reused. > > Fortunately, certain events and requests have been tagged as > destructors in the protocol descriptions! > > Also fortunately, it appears (based on reading the wl_resource_destroy > code in wayland-server.c) that for many object IDs, specifically for > IDs of objects created by a client request (the ID appears as a new ID > arg of a request, and is thus in the client side of the ID range) and > for which the client makes a destructor request, the compositor will > always send a wl_display::delete_id event (assuming the > display_resource still exists for the client, which apparently would > only not be the case after the client connection is severed) to > acknowledge the destructor request. Any attempt to reuse that ID prior > to the wl_display::delete_id event can lead to confusion, and should be > avoided. Reuse of the ID after the wl_display::delete_id event should > not result in any confusion. > > [BTW: for the purpose of this discussion, an object is "created" when > it is introduced into a protocol message for the first time via a new_id > argument. It does not refer to the actual allocation of the object in > memory or to its initialization.] Your whole above analysis is completely correct! > However, the other cases are not as easy to identify. > > The other cases are: > 1. an object created by a client request that has destructor events > 2. an object created by the compositor > > It might be true that case 1 does not exist. Is there a general rule > against that such cases would never be considered in future expansions > of the Wayland protocol? Destructor events do exist. Tagging them as such in the XML was not done from the beginning though, it was added later in a backward-compatible manner which makes the tag more informational than something libwayland could automatically process. The foremost example is wl_callback.done event. This is only safe because it is guaranteed that the client cannot be sending a request on wl_callback at the same time the server is sending 'done' and destroying the object: wl_callback has no requests defined at all. It also requires that nothing passes an existing wl_callback object as an argument in any request. We have been merely lucky that no-one has done that. It's really hard to imagine a use case where you would want to pass an existing wl_callback to anything. Extensions may have similar objects
Questions about object ID lifetimes
Forgive the long post. Tl;dr: what are the rules of object ID lifetime and reuse in the Wayland protocol? I am attempting to understand the rules of object ID lifetime within the Wayland protocol in order to construct Wayland middleware (similar to some of the tools featured on https://wayland.freedesktop.org/extras.html). I could not find a comprehensive discussion of the details online. If one exists, I would greatly appreciate a link! Middleware tools that wish to decode Wayland messages sent between the compositor and its clients need to maintain an accurate mapping between object ID and object interface (type). This is needed because the wire protocol's message header includes only the target object ID and an opcode that is relative to the object's type (the message header also includes the message length - about which I also have questions - to be pursued later...). The message (request or event) and its argument encoding can only be determined if the object ID -> type and type + opcode -> message mappings are accurately maintained. The type + opcode -> message mapping is static and can be extracted offline from the protocol XML files. Since object IDs can be reused, it is important for the middleware to understand when an ID can be reused and when it cannot be to avoid errors in the ID -> type mapping. Because the Wayland protocol is asynchronous, any message that implies destruction of an object should be acknowledged by the receiver before the destructed object's ID is reused. Fortunately, certain events and requests have been tagged as destructors in the protocol descriptions! Also fortunately, it appears (based on reading the wl_resource_destroy code in wayland-server.c) that for many object IDs, specifically for IDs of objects created by a client request (the ID appears as a new ID arg of a request, and is thus in the client side of the ID range) and for which the client makes a destructor request, the compositor will always send a wl_display::delete_id event (assuming the display_resource still exists for the client, which apparently would only not be the case after the client connection is severed) to acknowledge the destructor request. Any attempt to reuse that ID prior to the wl_display::delete_id event can lead to confusion, and should be avoided. Reuse of the ID after the wl_display::delete_id event should not result in any confusion. [BTW: for the purpose of this discussion, an object is "created" when it is introduced into a protocol message for the first time via a new_id argument. It does not refer to the actual allocation of the object in memory or to its initialization.] However, the other cases are not as easy to identify. The other cases are: 1. an object created by a client request that has destructor events 2. an object created by the compositor It might be true that case 1 does not exist. Is there a general rule against that such cases would never be considered in future expansions of the Wayland protocol? For objects created by the compositor, there are 2 subcases: 2a. objects with only destructor events 2b. objects with destructor requests Again, it might be the case that 2b does not exist, as it is analogous to case 1 above. But, is there a general rule against such future cases as well? Combining 1 and 2b, is there a general rule that says that only the object creator can initiate an object's destruction (unprovoked by the other side of the protocol)? For object IDs created by the compositor and with only destructor events (case 2a), it may be necessary to understand the details of each interface in question to decide when the ID can be reused, as there is no universal destructor acknowledgement request comparable to the wl_display::delete_id event. A requirement to understand the details to that level would make middleware development more difficult. Insert extreme sadness emoji here. Thankfully, it seems that destructor events are themselves acknowledgements of requests for destruction by the client (such as wp_drm_lease_device_v1::released event destructor vs. wp_drm_lease_device_v1::release request), or involve objects with a very limited lifetime and usage, such as callbacks (wp_presentation_feedback, zwp_linux_buffer_release, and zwp_fullscreen_shell_mode_feedback_v1). These limited lifetime/usage objects are created with the knowledge that all messages for them are destructor events, and that they are not involved in any other messages (as targets or arguments). Hence their destruction needs no further acknowledgement because the request for destruction was implied by their creation. The destructor event is the acknowledgement of that request. Is this a general rule: that a destructor event is is always the acknowledgement of a (perhaps implied) destruction request? So there may be two general simple rules that the middleware can follow to maintain a proper ID -> type mapping through ID reuse cycles: 1. reuse of ID is allowed after w