On Thu, 14 Sep 2023 16:32:06 +0300 Pekka Paalanen <ppaala...@gmail.com> wrote:
> ... > > congratulations, I think you may have found everything that is not > quite right in the fundamental Wayland protocol design. :-) Oh, you flatter me. I'm sure there's more! > > As an aside, we collect unfixable issues under > https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next > These are issues that are either impossible or very difficult or > annoying to fix while keeping backward compatibility with both servers > and clients. Only 7 of them? > > ------ > > Object ID re-use is what I would call "aggressive": in the libwayland > C implementation, the object ID last freed is the first one to be > allocated next. There are two separate allocation ranges each with its > own free list: server and client allocated IDs. After I sent the initial post, I realized that the two separate ID ranges help in the following way: For any object ID in the allocation range of side A, a destructor message from side B does not need acknowledgement. This is because B can't introduce a new object bound to that ID, only A can. Hence, any new_id arg for that ID is an acknowledgement of the destruction. However, B has to be careful to ignore messages containing that ID until it sees one with the ID as a new_id arg. After the destructor message from B but before a subsequent new_id for that ID from A, B should not use the ID as arguments to other messages (and attempts to do so can be dropped). And this can be automated provided the destructor tag can be relied on. > > The C implementation also poses an additional restriction: a new ID > can be at most the largest ever allocated ID + 1. > > All this is to keep the ID map as compact as possible without a hash > table. These details are in the implementation of the private 'struct > wl_map' in libwayland. Obviouly, that helps middleware as well, for the same reasons. It also makes more automatic error detection possible. > ... > > Your whole above analysis is completely correct! I was rather hoping things would turn out less complex than they seemed... > > > However, the other cases are not as easy to identify. > > > > The other cases are: > > 1. an object created by a client request that has destructor events > > 2. an object created by the compositor > > > > It might be true that case 1 does not exist. Is there a general > > rule against that such cases would never be considered in future > > expansions of the Wayland protocol? > > Destructor events do exist. Tagging them as such in the XML was not > done from the beginning though, it was added later in a > backward-compatible manner which makes the tag more informational than > something libwayland could automatically process. The foremost example > is wl_callback.done event. This is only safe because it is guaranteed > that the client cannot be sending a request on wl_callback at the same > time the server is sending 'done' and destroying the object: > wl_callback has no requests defined at all. Fortunately, my point above about the advantage of the separate ID ranges helps here. If wl_callback is created by the client, then a wl_callback.done event tagged as a destructor does not need acknowledgement AND is always safe provided that messages involving the wl_callback ID (other than it's eventual reuse as a new_id arg) are ignored above libwayland. But again, this means the destructor tag is important and not merely informational. I did notice that the destructor tagging was added mostly (or solely) to help with code generation by wayland-scanner implementations in programming languages where destructors require some specific syntactic notation. But maybe destructor tagging is even better than that? Maybe it would allow libwayland to automate more in a more robust way AND also allow for middleware that doesn't have to simulate all of the semantic level interactions induced by protocol messages in order to merely keep track of how to decode messages. > > It also requires that nothing passes an existing wl_callback object as > an argument in any request. We have been merely lucky that no-one has > done that. It's really hard to imagine a use case where you would want > to pass an existing wl_callback to anything. Again, the above separate ID ranges point addresses this, I think. > > Extensions may have similar objects that only deliver some one-off > events and then "self-destruct" by the final event. All this is simply > documented and not marked in the XML. That's what I was hoping to avoid. If there are object types where object lifetime can only be understood by simulating all of the relevant semantic content of the messages involved, then that's not good for middleware. Isn't it also problematic towards the goals of libwayland, because it makes it impossible for libwayland to ensure that messages are properly decoded without trusting that the client and/or compositor have implemented everything properly? Because demarshaling a message incorrectly seems like it would be bad news. > > The asymmetry in the fundamental protocol bookkeeping messages > (wl_display.delete_id) is unfortunate in hindsight. Client created and > client destroyed objects are best supported, anything else will > require very careful messaging design to avoid a race pitfall. We > have a proper tear-down sequence for a client destroying an object, > but we do not have a similar sequence for the server destroying an > object, where the client would ack the destruction before the object > ID is re-used. If you add one, it would be compatible with giving importance to destructor tagging if done as follows: The initial even proposing the removal of the object is NOT tagged as a destructor, but the acknowledgement from the client IS tagged as a destructor. That would become a case like the ones above where a destructor is issued by the side opposite the one owning the object ID range: such destructors can be the final message and don't themselves need further acknowledgement. > > > For objects created by the compositor, there are 2 subcases: > > > > 2a. objects with only destructor events > > 2b. objects with destructor requests > > > > Again, it might be the case that 2b does not exist, as it is > > analogous to case 1 above. But, is there a general rule against > > such future cases as well? Combining 1 and 2b, is there a general > > rule that says that only the object creator can initiate an > > object's destruction (unprovoked by the other side of the > > protocol)? > > There is no such rule, but nowadays the recommendation is to keep to > client created and client destroyed objects unless there are pressing > reasons to design something more delicate/fragile. > > 2b exists as wl_data_offer, created by wl_data_device.data_offer > event. Again, saved by my observation above about the safety added by separate object ID ranges for clients vs. compositor. > > I'm not aware of 2a existing in the wild, but they could I guess. > > Generally we try to avoid new_id in events, because it causes a race > by default: what if the server is sending a new_id event at the same > time the client destroys the originating object? The result is a > mismatch in object ID tracking between server and client, causing > unexpected failures perhaps much later. This can occur with the sides swapped, couldn't it? If a new_id arg appears in a message such that the destination side for that message can initiate destruction of the message's target object, then if things are not handled carefully, the new_id leaks (isn't ever reused). To prevent this leak, you either cannot allow that side to initiate destruction of the target object, or you have to keep at least the type information for the target object around until the destruction is acknowledged. If you did keep the type information, you could continue to decode the messages and send something like delete_id responses for each such new_id. > > When a client (or server) calls the respective API to destroy a > protocol object, libwayland guarantees that the object's > listeners/callbacks won't be called even if messages are still > received on that object until it is properly torn down. That makes > the user code easier to write, but it also throws the afterwards > received messages away. If that message was to create a new protocol > object, that won't happen, the ID won't be allocated. This causes the > mismatch. [I noticed the use of "zombies" in the client, but no parallel in the server. Also it seemed from a cursory analysis of the client code that the zombies are mostly to take care of cases that arise in multithreaded clients. Is that correct? Or are zombies related to this issue instead?] There is a domino effect as well: Messages involving that object as an argument that also have a different new_id argument have to be involved in the above processing as well, with delete_id sent for those new_id arguments. And so on for messages involving those new_id arguments. > > Libwayland cannot automatically destroy objects either, because it has > no information on how to do that properly and safely. The XML does not > tell for sure. Is it a goal for libwayland to be able to at least manage object IDs properly on its own? Because that would align with the needs of middleware. Also, I think it might not be too late (in backward compatibility terms) to do something about this. > ... > > > > > A. Ownership vs. destruction: An object created by a client can only > > have destructor requests, and an object created by the compositor > > can only have destructor events. > > False, violated in the wild. Examples: wl_callback created by client, > destroyed by server; wl_data_offer created by server, destroyed by > client. Again, that case addressed by the observation about separate ID ranges way above. > > > B. All destructor requests are followed by wl_display::delete_id > > events as acknowledgements > > True, as this is built in to libwayland. > > > C. All destructor events are themselves acknowledgements of > > (implicit) destruction requests. > > Close enough I suppose, assuming the events are correctly tagged as > destructors. As mentioned, the tag is heavily recommended but not > technically mandatory for the libwayland C bindings, because it was > introduced several years too late. If the destructor tag is necessary for other language bindings, that's sufficient for my purposes, I think. I can make sure I'm parsing protocol XML files that will work for languages that require destructor notation. But, if it isn't "technically mandatory", doesn't that risk the introduction of protocol for which wayland-scanner does not generate proper code for a language that requires destructor notation? > > > D. No object can be the target or argument of a message issued by > > one side after that side has issued a destructor request (explicit > > or implicit). > > Correct, I believe. An actual destructor request should be enforced > like that in libwayland. Allowing otherwise would defeat the point of > a destructor request or a non-destructor request intended to trigger a > destructor event. Nothing enforces the latter though. > > > Is this correct? Are these actual requirements that are enforced > > for the current protocol and future expansions? > > > > Can there be cases of objects created by the compositor, where the > > compositor proposes their destruction without any prior (implicit or > > explicit) request to do so from the client? If so, how are these > > I don't know of anything forbidding them. This again is the case where, as long as the ack from the client is what is tagged as being the "destructor", all is still OK. > Thanks, > pq I am still hopefull, especially after realizing the benefits of the separate ID ranges. The only case this leaves uncovered is object IDs in the compositor ID range where the compositor wants to initiate destruction of that object. And I believe that case can be handled automatically as well, possibly even in a backward-compatible way. The domimo effect case with chains of new_ids is complex, but I think there is a way to handle it at least semi-automatically. But, only if destructor tags (or something similar) can be relied on in future versions and extensions to the wayland protocol. My interests are in the possibility of robust middleware, but I think there is a convergence between three areas such that the same hopefully backward compatible additions yield positive results to all. These are: middleware, wayland-scanner bindings generation for languages with destructor notations, and completely managing the "meta" aspects of the protocol completely within libwayland. These meta aspects are about proper decoding (demarhsalling), proper reuse of object IDs and preventing ID leakage, and the prevention of bugs and/or excessive complexity that might be induced by attempts to manage these things above libwayland. And the ability to automatically check for errors in these cases and prevent their spread. Thanks, Jonathan