Re: Questions about object ID lifetimes

2023-09-27 Thread jleivent
On Wed, 27 Sep 2023 11:47:37 +0300
Pekka Paalanen  wrote:

> ..
> 
> You just need to tell meson where your build directory is, or cd into
> it first.
> 
> $ meson test -C build
> 
> or
> 
> $ cd build
> $ meson test
> 

Of course!


Re: Questions about object ID lifetimes

2023-09-27 Thread Pekka Paalanen
On Tue, 26 Sep 2023 10:56:03 -0400
jleivent  wrote:

> On Tue, 26 Sep 2023 11:53:07 +0300
> Pekka Paalanen  wrote:
> 
> > On Mon, 25 Sep 2023 12:05:30 -0400
> > jleivent  wrote:
> >   
> > > How do I get CI/CD capability turned on?  I tried building the unit
> > > tests locally, but get errors that suggest those tests need to be
> > > run in CI.  Issue 540 says I need to apply for the guest role - how
> > > do I do that?
> > 
> > I don't recall libwayland having anything that needs to be
> > specifically run in a CI environment, and if it does, it should
> > automatically skip outside of CI environment. Weston does this.
> > 
> > What errors did you get? How did you run them?
> > 
> > 'meson test' is the command.  
> 
> I get:
> 
> $ meson test
> 
> ERROR: No such build data file as
> '/home/jil/gits/wayland-idfix/meson-private/build.dat'.
> 
> I used this to build and install it:
> 
> $ meson build/ --prefix=/home/jil/gits/wayland-idfix/install/
> $ ninja -C build/ install
> 
> Since that didn't create the needed meson-private/build.dat, I thought
> that might get put in by the CI somehow.

You just need to tell meson where your build directory is, or cd into
it first.

$ meson test -C build

or

$ cd build
$ meson test


Thanks,
pq


pgp5cDyjNzKXO.pgp
Description: OpenPGP digital signature


Re: Questions about object ID lifetimes

2023-09-26 Thread jleivent
On Tue, 26 Sep 2023 11:53:07 +0300
Pekka Paalanen  wrote:

> On Mon, 25 Sep 2023 12:05:30 -0400
> jleivent  wrote:
> 
> > How do I get CI/CD capability turned on?  I tried building the unit
> > tests locally, but get errors that suggest those tests need to be
> > run in CI.  Issue 540 says I need to apply for the guest role - how
> > do I do that?  
> 
> I don't recall libwayland having anything that needs to be
> specifically run in a CI environment, and if it does, it should
> automatically skip outside of CI environment. Weston does this.
> 
> What errors did you get? How did you run them?
> 
> 'meson test' is the command.

I get:

$ meson test

ERROR: No such build data file as
'/home/jil/gits/wayland-idfix/meson-private/build.dat'.

I used this to build and install it:

$ meson build/ --prefix=/home/jil/gits/wayland-idfix/install/
$ ninja -C build/ install

Since that didn't create the needed meson-private/build.dat, I thought
that might get put in by the CI somehow.

> 
> I think applying for the guest role means that you can file an issue
> on the upstream project asking for the permission. At minimum, a
> maintainer needs to know your gitlab handle.

I'll do that.




Re: Questions about object ID lifetimes

2023-09-26 Thread Pekka Paalanen
On Mon, 25 Sep 2023 12:05:30 -0400
jleivent  wrote:

> How do I get CI/CD capability turned on?  I tried building the unit
> tests locally, but get errors that suggest those tests need to be run
> in CI.  Issue 540 says I need to apply for the guest role - how do I do
> that?

I don't recall libwayland having anything that needs to be specifically
run in a CI environment, and if it does, it should automatically skip
outside of CI environment. Weston does this.

What errors did you get? How did you run them?

'meson test' is the command.

I think applying for the guest role means that you can file an issue on
the upstream project asking for the permission. At minimum, a
maintainer needs to know your gitlab handle.

All this permission hassle is just to avoid people that want to steal
CPU time from CI runners for unrelated or unwanted purposes (like
building a complete Android OS image from scratch or cryptomining).


Thanks,
pq


pgprhB_7TXsgg.pgp
Description: OpenPGP digital signature


Re: Questions about object ID lifetimes

2023-09-25 Thread jleivent
On Wed, 20 Sep 2023 10:05:51 -0400
jleivent  wrote:

> ..
> Here's a very wild suggestion that would eliminate it and still
> be compatible with Wayland 1.  Add a delete_id request without
> modifying the existing protocol.

I have a delete_id request hack, enhanced zombies everywhere, a LRU
ring for zombie reuse (when there's no delete_id requests) on the
server, all with full compatibility maintained and no protocol
additions (so it's fully drop-in compatible for clients and
servers) building and running in my limited testing on my
jonleivent/wayland-idfix fork.  My README explains it in depth.  I
would like this to eventually become a pull request, but I need to do
more testing first.  Which brings me to my question:

How do I get CI/CD capability turned on?  I tried building the unit
tests locally, but get errors that suggest those tests need to be run
in CI.  Issue 540 says I need to apply for the guest role - how do I do
that?

Thanks,
Jon


Re: Questions about object ID lifetimes

2023-09-21 Thread Michel Dänzer
On 9/20/23 18:29, jleivent wrote:
> On Wed, 20 Sep 2023 10:05:51 -0400
> jleivent  wrote:
> 
>> ...
>> I'm considering forking libwayland and working on one or both of these
>> fixes for my own use, because I don't want to implement some even
>> crazier things in middleware to compensate for the server ID reuse
>> problem.
>>
> 
> I keep getting "An error occurred while forking the project.  Please try
> again."
> 
> Am I locked out of forking wayland?

Has your account been verified per 
https://gitlab.freedesktop.org/freedesktop/freedesktop/-/wikis/home#warning-restrictions-due-to-spam-warning
 ?


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: Questions about object ID lifetimes

2023-09-20 Thread jleivent
On Wed, 20 Sep 2023 10:05:51 -0400
jleivent  wrote:

> ...
> I'm considering forking libwayland and working on one or both of these
> fixes for my own use, because I don't want to implement some even
> crazier things in middleware to compensate for the server ID reuse
> problem.
> 

I keep getting "An error occurred while forking the project.  Please try
again."

Am I locked out of forking wayland?


Re: Questions about object ID lifetimes

2023-09-20 Thread jleivent
On Wed, 20 Sep 2023 11:30:19 +0300
Pekka Paalanen  wrote:

> ..
> > This might help reduce those anomalous messages and be compatible
> > with Wayland 1.  Reduce the greediness of object ID reuse by:
> > 
> > - not reusing any IDs unless at least some minimum number (256?)
> >   are free
> > 
> > - reuse the freed ones in LRU fashion  
> 
> Yeah, the free list could be a FIFO instead of a LIFO.
> 
> > There are other variations of this - the point of all being to
> > increase the time between when any ID becomes free and when it is
> > reused but without causing the ID maps to grow unreasonably large,
> > or causing their maintenance to slow down.
> > 
> > Increasing the time delay between freeing and reuse (such as with a
> > higher minimum free threshold above) would probably lead to lower
> > probability of anomalous messages. You could make this tunable
> > through an environment variable.
> > 
> > Note that the two sides don't have to agree to use this less-greedy
> > ID allocation for either side to use it - and it's really only
> > important for servers anyway.  
> 
> I'm wary of solutions that reduce the risk but do not eliminate it. If
> a protocol interface design turns out racy, it would be best to find
> that out sooner than later, and evaluate fixing it. Reproducibility
> helps analysis.
> 

Here's a very wild suggestion that would eliminate it and still
be compatible with Wayland 1.  Add a delete_id request without
modifying the existing protocol.

There are (at least) two pairs of ping/pong messages in the base
protocol: xdg_wm_base and wl_shell_surface.

From what I can tell (but I only have the wlroots code to look at),
when the client sends a pong that doesn't correspond to the most recent
ping, the server completely ignores it.  Also, the serial arg used in
pings starts low and is incremented.  Also, the servers tend to reset
the serial to 0 often.  So it never increments very high (even if it
never got reset, it's probably never going over 2^31-1).

This means it's possible to use a specially crafted pong as a
delete_id request.

The client could send a pong with the highest bit on (so it won't
accidentally match a real serial and ack a real ping) and the low bits
indicating the object ID whose deletion it is acking.

The server will, when it deletes one of its own objects, keep around at
least the type (interface) until it gets this pong.

There's two versions of this: one is that clients using patched
libwayland libs send the pong on their own after seeing the server-side
object deletion.  Another is that a patched server sends a ping when it
wants to reuse an ID to force a matching pong of an unpatched client
(this one assumes a client won't queue a server-side object deletion
and pong the ping before processing the deletion, hence still be able
to send anomalous messages involving the deleted ID - so it's risky).

If the server is patched to wait for these delete_id pongs, but the
client is not, then at best the server could fall back to using a less
greedy reuse as with my previous suggestion.  A patched client could
signal it is patched by sending an unsolicited specially crafted pong
(serial arg = 0x) early on.

It might be nice to give users the ability to start out with an
unpatched libwayland, but if they think they are seeing clients getting
killed off due to deleted server IDs in their requests, they could
switch to using a patched "unauthorized" libwayland.  It probably
wouldn't be too hard to write a tool that parses WAYLAND_DEBUG output
to see if an issue is due to delete server IDs.  They'd use the patched
libwayland at their own risk (but when isn't that the case?),
understanding that the "fix" is a bit of a hack.

I'm considering forking libwayland and working on one or both of these
fixes for my own use, because I don't want to implement some even
crazier things in middleware to compensate for the server ID reuse
problem.



Re: Questions about object ID lifetimes

2023-09-20 Thread Pekka Paalanen
On Tue, 19 Sep 2023 10:02:55 -0400
jleivent  wrote:

> On Tue, 19 Sep 2023 16:26:37 +0300
> Pekka Paalanen  wrote:
> 
> > ...  
> > > But aren't those fast frame updates done through shared fds?  Hence
> > > not part of the wire protocol, and would not be impacted by
> > > increasing the length of messages on the wire?
> > 
> > No. They are messages sent on the wire, telling "there is a new image
> > on that other fd I shared with you before, use that now", and so on.
> > That is usually a handful of requests per frame.  
> 
> Didn't realize that.
> > 
> > I would argue that "speculative" is not the right word here, it was
> > never intended.  
> 
> How about: there are "anomalous" messages and state changes?

I believe we tend to call them just race conditions, or racy messages.

Btw. about grouping messages, we hand-roll protocol for that too: there
is wl_surface.commit to latch a bunch of state, and the shell related
extensions have the final 'configure' event, and it's common to have a
'done' event on an interface that splits sending state into multiple
events. Several input interfaces have 'frame' event for the same. This
helps the receiving side to wait for the complete state transmission
before acting on it.

It is inconvenient to have to design all this by hand every time in a
new interface, and I agree it would be nice if the wire protocol
foundation offered a solution somehow, but I'm not sure how that should
look in the hypothetical Wayland 2.

> > > tl;dr: protocol asynchrony leads to speculation that can result in
> > > the two sides disagreeing about the correct state of the world.  
> > 
> > We avoid that with careful protocol design in XML. There is exactly
> > that kind of situation in the xdg-family of extensions and it is
> > solved by sending a serial with the events and acking that serial
> > when the client acts on the events.
> > 
> > It's a known caveat.  
> 
> OK.
> 
> This might help reduce those anomalous messages and be compatible with
> Wayland 1.  Reduce the greediness of object ID reuse by:
> 
> - not reusing any IDs unless at least some minimum number (256?)
>   are free
> 
> - reuse the freed ones in LRU fashion

Yeah, the free list could be a FIFO instead of a LIFO.

> There are other variations of this - the point of all being to increase
> the time between when any ID becomes free and when it is reused but
> without causing the ID maps to grow unreasonably large, or causing their
> maintenance to slow down.
> 
> Increasing the time delay between freeing and reuse (such as with a
> higher minimum free threshold above) would probably lead to lower
> probability of anomalous messages. You could make this tunable through
> an environment variable.
> 
> Note that the two sides don't have to agree to use this less-greedy ID
> allocation for either side to use it - and it's really only important
> for servers anyway.

I'm wary of solutions that reduce the risk but do not eliminate it. If
a protocol interface design turns out racy, it would be best to find
that out sooner than later, and evaluate fixing it. Reproducibility
helps analysis.


Thanks,
pq


pgpAKb_XmKIC0.pgp
Description: OpenPGP digital signature


Re: Questions about object ID lifetimes

2023-09-19 Thread jleivent
On Tue, 19 Sep 2023 10:02:55 -0400
jleivent  wrote:
> ...
> This might help reduce those anomalous messages and be compatible with
> Wayland 1.  Reduce the greediness of object ID reuse by:
> 
> - not reusing any IDs unless at least some minimum number (256?)
>   are free
> 
> - reuse the freed ones in LRU fashion

This also needs something like zombies on the server side.  At least
retain the type info for a free ID until it is reused.


Re: Questions about object ID lifetimes

2023-09-19 Thread jleivent
On Tue, 19 Sep 2023 16:26:37 +0300
Pekka Paalanen  wrote:

> ...
> > But aren't those fast frame updates done through shared fds?  Hence
> > not part of the wire protocol, and would not be impacted by
> > increasing the length of messages on the wire?  
> 
> No. They are messages sent on the wire, telling "there is a new image
> on that other fd I shared with you before, use that now", and so on.
> That is usually a handful of requests per frame.

Didn't realize that.
> 
> I would argue that "speculative" is not the right word here, it was
> never intended.

How about: there are "anomalous" messages and state changes?


> > tl;dr: protocol asynchrony leads to speculation that can result in
> > the two sides disagreeing about the correct state of the world.
> 
> We avoid that with careful protocol design in XML. There is exactly
> that kind of situation in the xdg-family of extensions and it is
> solved by sending a serial with the events and acking that serial
> when the client acts on the events.
> 
> It's a known caveat.

OK.

This might help reduce those anomalous messages and be compatible with
Wayland 1.  Reduce the greediness of object ID reuse by:

- not reusing any IDs unless at least some minimum number (256?)
  are free

- reuse the freed ones in LRU fashion

There are other variations of this - the point of all being to increase
the time between when any ID becomes free and when it is reused but
without causing the ID maps to grow unreasonably large, or causing their
maintenance to slow down.

Increasing the time delay between freeing and reuse (such as with a
higher minimum free threshold above) would probably lead to lower
probability of anomalous messages. You could make this tunable through
an environment variable.

Note that the two sides don't have to agree to use this less-greedy ID
allocation for either side to use it - and it's really only important
for servers anyway.



Re: Questions about object ID lifetimes

2023-09-19 Thread Pekka Paalanen
On Mon, 18 Sep 2023 11:31:18 -0400
jleivent  wrote:

> On Mon, 18 Sep 2023 14:06:51 +0300
> Pekka Paalanen  wrote:
> 
> > On Sat, 16 Sep 2023 12:18:35 -0400
> > jleivent  wrote:
> >   
> > > The easiest fix I can think of is to go full-on half duplex.
> > > Meaning that each side doesn't send a single message until it has
> > > fully processed all messages sent to it in the order they arrive
> > > (thankfully, sockets preserve message order, else this would be
> > > much harder). Have you considered half duplex?
> > 
> > Never crossed my mind at least. I can't even imagine how it could be
> > implemented through a socket, because both sides must be able to
> > spontaneously send a message at any time.  
> 
> By taking turns.  Each side would, after queuing up a batch of
> messages, add an "Over!" message (from the days of half-duplex
> radio communications) to the end of that queue, and then send the whole
> queue (retaining its sequence).  Neither side would send a message
> until it receives the other side's "Over!" message, and until the
> higher levels above libwayland have had a chance to examine all
> messages prior to "Over!" in order to avoid sending an inconsistent
> message or even committing to a state incompatible with later messages.
> 
> >   
> > > Certainly, it would mean a loss
> > > of some concurrency, hence a potential performance hit.  But
> > > probably not that much in this case, as most of the message
> > > back-and-forth in Wayland occurs at user-interaction speeds, while
> > > the speed-needing stuff happens through fd sharing and similar
> > > things outside the protocol. I
> > 
> > That user interaction speed can be in the order of a kilohertz, for
> > gaming mice, at least in one direction. In the other direction,
> > surface update rate is also unlimited, games may want to push out
> > frames even if only every tenth gets displayed to reduce latency.
> > Also truly tearing screen updates are being developed.  
> 
> But aren't those fast frame updates done through shared fds?  Hence not
> part of the wire protocol, and would not be impacted by increasing the
> length of messages on the wire?

No. They are messages sent on the wire, telling "there is a new image
on that other fd I shared with you before, use that now", and so on.
That is usually a handful of requests per frame.

Likewise, every pointer motion event is one or multiple wire events.

Shared fds are used for sharing big chunks of data mostly, that is,
shared memory. But we don't use shared memory messaging nor locks. All
messaging is Wayland messages over the socket. After all, the XML files
describe wire messages.

We want everything to be in the same protocol stream as much as
possible to reduce race possibilities. If we had shared memory
messaging in addition to the unix socket, that would be two mutually
async protocol streams in the same direction. That would be quite a
pain, as we've learnt from Xwayland (you have both Wayland and X11
connections between the same two entities; as another matter, libX11 is
really eager to have blocking roundtrips, so if libwayland would also
block for something, a deadlock is practically guaranteed eventually).

> >   
> > > think it can be made mostly backward compatible. It would probably
> > > require some "all done" interaction between libwayland and higher
> > > levels on each side, but that's probably (hopefully) not too hard.
> > > There may even be a way to automate the "all done" interaction to
> > > make this fully backward compatible, because libwayland knows when
> > > there are no more messages to be processed on the wire, and it can
> > > queue-up the messages on each side before placing them on the wire.
> > >  It might need to do things like re-order ping/pong messages with
> > > respect to the others to make sure the pinging side (compositor)
> > > doesn't declare the client dead while waiting.  But that seems
> > > minor, as long as all such ping/pong pairs are opaque to the
> > > remainder of the protocol, hence always commute with other
> > > messages.
> > 
> > If you mean adding new ping/pong stuff, that doesn't sound very nice,
> > because Wayland also aims to be power efficient: if truly nothing is
> > happening, let the processes sleep. Anyone could still wake up any
> > time, and send a message.  
> 
> Not adding.  Dealing with the already existing (or if any new ones are
> added) ping/pong pairs.  Or any messages that really need to be timely,
> hence can't wait for messages in front of them to be fully processed.

There are no existing mandatory ping/pong messages. Some extensions
have some, but all extensions are by definition optional from the
libwayland point of view.

Wayland messages are strictly ordered per direction, there is zero
expectation or guarantee that anything could be re-ordered at
libwayland level.

> That could apply to any real-time requirements, like the gaming mice
> messages you mentioned above.  But doing this in gen

Re: Questions about object ID lifetimes

2023-09-18 Thread jleivent
On Mon, 18 Sep 2023 14:06:51 +0300
Pekka Paalanen  wrote:

> On Sat, 16 Sep 2023 12:18:35 -0400
> jleivent  wrote:
> 
> > The easiest fix I can think of is to go full-on half duplex.
> > Meaning that each side doesn't send a single message until it has
> > fully processed all messages sent to it in the order they arrive
> > (thankfully, sockets preserve message order, else this would be
> > much harder). Have you considered half duplex?  
> 
> Never crossed my mind at least. I can't even imagine how it could be
> implemented through a socket, because both sides must be able to
> spontaneously send a message at any time.

By taking turns.  Each side would, after queuing up a batch of
messages, add an "Over!" message (from the days of half-duplex
radio communications) to the end of that queue, and then send the whole
queue (retaining its sequence).  Neither side would send a message
until it receives the other side's "Over!" message, and until the
higher levels above libwayland have had a chance to examine all
messages prior to "Over!" in order to avoid sending an inconsistent
message or even committing to a state incompatible with later messages.

> 
> > Certainly, it would mean a loss
> > of some concurrency, hence a potential performance hit.  But
> > probably not that much in this case, as most of the message
> > back-and-forth in Wayland occurs at user-interaction speeds, while
> > the speed-needing stuff happens through fd sharing and similar
> > things outside the protocol. I  
> 
> That user interaction speed can be in the order of a kilohertz, for
> gaming mice, at least in one direction. In the other direction,
> surface update rate is also unlimited, games may want to push out
> frames even if only every tenth gets displayed to reduce latency.
> Also truly tearing screen updates are being developed.

But aren't those fast frame updates done through shared fds?  Hence not
part of the wire protocol, and would not be impacted by increasing the
length of messages on the wire?

> 
> > think it can be made mostly backward compatible. It would probably
> > require some "all done" interaction between libwayland and higher
> > levels on each side, but that's probably (hopefully) not too hard.
> > There may even be a way to automate the "all done" interaction to
> > make this fully backward compatible, because libwayland knows when
> > there are no more messages to be processed on the wire, and it can
> > queue-up the messages on each side before placing them on the wire.
> >  It might need to do things like re-order ping/pong messages with
> > respect to the others to make sure the pinging side (compositor)
> > doesn't declare the client dead while waiting.  But that seems
> > minor, as long as all such ping/pong pairs are opaque to the
> > remainder of the protocol, hence always commute with other
> > messages.  
> 
> If you mean adding new ping/pong stuff, that doesn't sound very nice,
> because Wayland also aims to be power efficient: if truly nothing is
> happening, let the processes sleep. Anyone could still wake up any
> time, and send a message.

Not adding.  Dealing with the already existing (or if any new ones are
added) ping/pong pairs.  Or any messages that really need to be timely,
hence can't wait for messages in front of them to be fully processed.

That could apply to any real-time requirements, like the gaming mice
messages you mentioned above.  But doing this in general is hard unless
the messages are irrelevant to the rest of the protocol (hence commute
with everything else), like ping/pong are.

> 
> 
> On Sun, 17 Sep 2023 15:28:04 -0400
> jleivent  wrote:
> 
> > Has altering the wire format to contain all the info needed for
> > unambiguous decoding of each message entirely within libwayland
> > without needing to know the object ID -> type mapping been
> > considered?  
> 
> Not that I can recall. The wire format is ABI, libwayland is not the
> only implementation of it, so that would be Wayland 2 material.

So no changes to the wire format are possible under any circumstances
in Wayland 1?

> 
> > It would make the messages longer, but this seems like it wouldn't
> > be very bad for performance because wire message transfer is roughly
> > aligned with user interaction speeds.  
> 
> We need to be able to deal with at least a few thousand messages per
> second easily.
> 
> The overhead seems a bit bad if every message would need to carry its
> signature.

Encoding more into the message is only needed if there are no
destructor request acks (the equivalent of wl_display::delete_id, but
in the opposite direction).  But I was wondering why not do it for
robustness.

The signature isn't very big, but it's probably not needed even for
robustness.  What's needed is the target object type/version
information. Since from that both sides know the signature.  The issue
is just how to add robustness to the object ID -> type/version
mapping, which is the source of many problems.  The signatures ar

Re: Questions about object ID lifetimes

2023-09-18 Thread Pekka Paalanen
On Sat, 16 Sep 2023 12:18:35 -0400
jleivent  wrote:

> The easiest fix I can think of is to go full-on half duplex.  Meaning
> that each side doesn't send a single message until it has fully
> processed all messages sent to it in the order they arrive (thankfully,
> sockets preserve message order, else this would be much harder).
> Have you considered half duplex?

Never crossed my mind at least. I can't even imagine how it could be
implemented through a socket, because both sides must be able to
spontaneously send a message at any time.

> Certainly, it would mean a loss
> of some concurrency, hence a potential performance hit.  But probably
> not that much in this case, as most of the message back-and-forth in
> Wayland occurs at user-interaction speeds, while the speed-needing stuff
> happens through fd sharing and similar things outside the protocol. I

That user interaction speed can be in the order of a kilohertz, for
gaming mice, at least in one direction. In the other direction, surface
update rate is also unlimited, games may want to push out frames even
if only every tenth gets displayed to reduce latency. Also truly
tearing screen updates are being developed.

> think it can be made mostly backward compatible. It would probably
> require some "all done" interaction between libwayland and higher
> levels on each side, but that's probably (hopefully) not too hard.
> There may even be a way to automate the "all done" interaction to make
> this fully backward compatible, because libwayland knows when there are
> no more messages to be processed on the wire, and it can queue-up the
> messages on each side before placing them on the wire.  It might need
> to do things like re-order ping/pong messages with respect to the
> others to make sure the pinging side (compositor) doesn't declare the
> client dead while waiting.  But that seems minor, as long as all such
> ping/pong pairs are opaque to the remainder of the protocol, hence
> always commute with other messages.

If you mean adding new ping/pong stuff, that doesn't sound very nice,
because Wayland also aims to be power efficient: if truly nothing is
happening, let the processes sleep. Anyone could still wake up any
time, and send a message.


On Sun, 17 Sep 2023 15:28:04 -0400
jleivent  wrote:

> Has altering the wire format to contain all the info needed for
> unambiguous decoding of each message entirely within libwayland without
> needing to know the object ID -> type mapping been considered?

Not that I can recall. The wire format is ABI, libwayland is not the
only implementation of it, so that would be Wayland 2 material.

> It would make the messages longer, but this seems like it wouldn't be
> very bad for performance because wire message transfer is roughly
> aligned with user interaction speeds.

We need to be able to deal with at least a few thousand messages per
second easily.

The overhead seems a bit bad if every message would need to carry its
signature.

> Also, for any compositor/client pair, as long as they both use the same
> version of libwayland, the necessary wire format change would not
> result in compatibility issues.  It would for static linked cases,
> or similar mismatching cases (flatpak, appimage, snap, etc. unless
> the host version is mapped in instead of the packaged one somehow).
> There also seem to be unused bits in the existing wire format so that
> one could detect an a compositor/client incompatibility at least on one
> end.

We've never had the requirement for compositor and clients to use the
same minor version of libwayland. There are also completely independent
Wayland implementations in other languages that expect to be
interoperable. Breaking all that seems unacceptable.

What unused bits did you find?

> I'm not suggesting that unambiguous decoding of all messages is a
> sufficient fix, but it is a necessary one.  There are still speculative
> computation issues that it wouldn't resolve alone.

I didn't understand what is speculative. There is no roll-back of any
kind on anything, what's computed is final.


Thanks,
pq


pgpd7U7FgFN0Q.pgp
Description: OpenPGP digital signature


Re: Questions about object ID lifetimes

2023-09-17 Thread jleivent
Has altering the wire format to contain all the info needed for
unambiguous decoding of each message entirely within libwayland without
needing to know the object ID -> type mapping been considered?

It would make the messages longer, but this seems like it wouldn't be
very bad for performance because wire message transfer is roughly
aligned with user interaction speeds.

Also, for any compositor/client pair, as long as they both use the same
version of libwayland, the necessary wire format change would not
result in compatibility issues.  It would for static linked cases,
or similar mismatching cases (flatpak, appimage, snap, etc. unless
the host version is mapped in instead of the packaged one somehow).
There also seem to be unused bits in the existing wire format so that
one could detect an a compositor/client incompatibility at least on one
end.

I'm not suggesting that unambiguous decoding of all messages is a
sufficient fix, but it is a necessary one.  There are still speculative
computation issues that it wouldn't resolve alone.


Re: Questions about object ID lifetimes

2023-09-16 Thread jleivent
Pekka,

After thinking more about what you said, I'm no longer optimistic.

First, you are correct that my observation about opposite-side (side
A-ranged ID vs. side B destructor) only works for middleware, and then
only if the compositor and clients already handle their issues
properly.

Secondly, when thinking about the case of a message that arrives after
an object has been deleted with new_ids in it, it occurs to me that this
is a special case of a greater problem due to the existence of
speculative computation as a result of the protocol's asynchrony.  Any
time there are at least two messages that don't commute with each other
(and destruction is a case of a message that never commutes with any
other message to the same object) where the two messages can be sent
from opposite sides, at least one of them has to be undone somehow.  And
that undoing has to include state changes that preceeded it on its
sending side that didn't take into account the other (non-undone)
message.  This is bad.

It wouldn't be so bad if the protocol used some old-time mutexes or
database read-vs-write transactional consistency preservation
mechanisms. But those require quite a bit of input from higher levels
(above libwayland).  And there's deadlock to deal with.

The easiest fix I can think of is to go full-on half duplex.  Meaning
that each side doesn't send a single message until it has fully
processed all messages sent to it in the order they arrive (thankfully,
sockets preserve message order, else this would be much harder).
Have you considered half duplex?  Certainly, it would mean a loss
of some concurrency, hence a potential performance hit.  But probably
not that much in this case, as most of the message back-and-forth in
Wayland occurs at user-interaction speeds, while the speed-needing stuff
happens through fd sharing and similar things outside the protocol. I
think it can be made mostly backward compatible. It would probably
require some "all done" interaction between libwayland and higher
levels on each side, but that's probably (hopefully) not too hard.
There may even be a way to automate the "all done" interaction to make
this fully backward compatible, because libwayland knows when there are
no more messages to be processed on the wire, and it can queue-up the
messages on each side before placing them on the wire.  It might need
to do things like re-order ping/pong messages with respect to the
others to make sure the pinging side (compositor) doesn't declare the
client dead while waiting.  But that seems minor, as long as all such
ping/pong pairs are opaque to the remainder of the protocol, hence
always commute with other messages.

As for my own middleware project, I think I will try to detect message
decoding issues in all cases by keeping the most recent two types of
each ID, and attempting to decode both ways (most recent first).  There
are fortunately a bunch of internal consistency checks that can be done,
such as length of overall message vs. length of args vs. string length
vs. null string termination, etc.  But if the middleware gets a message
that passes these decoding consistency checks for both of those types,
then depending on what it is trying to do (as in one of my use cases,
securing a sandboxed application), it may have to cut off the client.




Re: Questions about object ID lifetimes

2023-09-15 Thread Pekka Paalanen
On Thu, 14 Sep 2023 15:10:48 -0400
jleivent  wrote:

> On Thu, 14 Sep 2023 16:32:06 +0300
> Pekka Paalanen  wrote:
> 
> > 
> > As an aside, we collect unfixable issues under
> > https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next
> > These are issues that are either impossible or very difficult or
> > annoying to fix while keeping backward compatibility with both servers
> > and clients.  
> 
> Only 7 of them?

Some of them are bags of issues.

Feel free to collect your ideas in new issues though. You may find
interested people.

> > --
> > 
> > Object ID re-use is what I would call "aggressive": in the libwayland
> > C implementation, the object ID last freed is the first one to be
> > allocated next. There are two separate allocation ranges each with its
> > own free list: server and client allocated IDs.  
> 
> After I sent the initial post, I realized that the two separate
> ID ranges help in the following way:
> 
> For any object ID in the allocation range of side A, a destructor
> message from side B does not need acknowledgement.  This is because B
> can't introduce a new object bound to that ID, only A can.  Hence, any
> new_id arg for that ID is an acknowledgement of the destruction.

That's an interesting interpretation. Perhaps it works for your case,
but I would not use in regular clients and servers, because I'd like to
be able to catch ID re-use errors like libwayland does.

> However, B has to be careful to ignore messages containing that ID
> until it sees one with the ID as a new_id arg.  After the destructor
> message from B but before a subsequent new_id for that ID from A, B
> should not use the ID as arguments to other messages (and attempts to
> do so can be dropped).  And this can be automated provided the
> destructor tag can be relied on.
> 
> > 
> > The C implementation also poses an additional restriction: a new ID
> > can be at most the largest ever allocated ID + 1.
> > 
> > All this is to keep the ID map as compact as possible without a hash
> > table. These details are in the implementation of the private 'struct
> > wl_map' in libwayland.  
> 
> Obviouly, that helps middleware as well, for the same reasons.  It also
> makes more automatic error detection possible.
> 
> > ...
> > 
> > Your whole above analysis is completely correct!  
> 
> I was rather hoping things would turn out less complex than they
> seemed...
> 
> >   
> > > However, the other cases are not as easy to identify.
> > > 
> > > The other cases are:
> > > 1. an object created by a client request that has destructor events
> > > 2. an object created by the compositor
> > > 
> > > It might be true that case 1 does not exist.  Is there a general
> > > rule against that such cases would never be considered in future
> > > expansions of the Wayland protocol?
> > 
> > Destructor events do exist. Tagging them as such in the XML was not
> > done from the beginning though, it was added later in a
> > backward-compatible manner which makes the tag more informational than
> > something libwayland could automatically process. The foremost example
> > is wl_callback.done event. This is only safe because it is guaranteed
> > that the client cannot be sending a request on wl_callback at the same
> > time the server is sending 'done' and destroying the object:
> > wl_callback has no requests defined at all.  
> 
> Fortunately, my point above about the advantage of the separate ID
> ranges helps here.  If wl_callback is created by the client, then a
> wl_callback.done event tagged as a destructor does not need
> acknowledgement AND is always safe provided that messages involving the
> wl_callback ID (other than it's eventual reuse as a new_id arg) are
> ignored above libwayland.

I think there is another asymmetry here. libwayland-client definitely
ignores events on an object ID whose wl_proxy has been destroyed from
the API user point of view. libwayland-server though seems to be
throwing a protocol error immediately on any non-existing object ID
receiving a message.

I think there is a case that requires the delete_id event and cannot
work solely on the new_id ID re-use rule:
- client sends request to create wl_callback
- client destroys the wl_callback (no request, enters zombie state client side)
- client creates some other new object X

When the server destroys the wl_callback, it sends the done event, and
delete_id event, and cannot know if the client has already destroyed
the wl_callback or not.

Zombie IDs are not eligible for re-use. The zombie is destroyed and the
ID freed on delete_id event. If the client has not seen the delete_id
yet, it allocates a new high ID for object X. Otherwise, it re-uses the
wl_callback's old ID for X.

However, there is no delete_id in the opposite direction, meaning that
a similar situation in the opposite direction is racy and can lead to
confusing object IDs. I guess this is what you already found out.

Zombies are actually what allows lib

Re: Questions about object ID lifetimes

2023-09-14 Thread jleivent
On Thu, 14 Sep 2023 16:32:06 +0300
Pekka Paalanen  wrote:

> ...
> 
> congratulations, I think you may have found everything that is not
> quite right in the fundamental Wayland protocol design. :-)

Oh, you flatter me.  I'm sure there's more!

> 
> As an aside, we collect unfixable issues under
> https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next
> These are issues that are either impossible or very difficult or
> annoying to fix while keeping backward compatibility with both servers
> and clients.

Only 7 of them?

> 
> --
> 
> Object ID re-use is what I would call "aggressive": in the libwayland
> C implementation, the object ID last freed is the first one to be
> allocated next. There are two separate allocation ranges each with its
> own free list: server and client allocated IDs.

After I sent the initial post, I realized that the two separate
ID ranges help in the following way:

For any object ID in the allocation range of side A, a destructor
message from side B does not need acknowledgement.  This is because B
can't introduce a new object bound to that ID, only A can.  Hence, any
new_id arg for that ID is an acknowledgement of the destruction.
However, B has to be careful to ignore messages containing that ID
until it sees one with the ID as a new_id arg.  After the destructor
message from B but before a subsequent new_id for that ID from A, B
should not use the ID as arguments to other messages (and attempts to
do so can be dropped).  And this can be automated provided the
destructor tag can be relied on.

> 
> The C implementation also poses an additional restriction: a new ID
> can be at most the largest ever allocated ID + 1.
> 
> All this is to keep the ID map as compact as possible without a hash
> table. These details are in the implementation of the private 'struct
> wl_map' in libwayland.

Obviouly, that helps middleware as well, for the same reasons.  It also
makes more automatic error detection possible.

> ...
> 
> Your whole above analysis is completely correct!

I was rather hoping things would turn out less complex than they
seemed...

> 
> > However, the other cases are not as easy to identify.
> > 
> > The other cases are:
> > 1. an object created by a client request that has destructor events
> > 2. an object created by the compositor
> > 
> > It might be true that case 1 does not exist.  Is there a general
> > rule against that such cases would never be considered in future
> > expansions of the Wayland protocol?  
> 
> Destructor events do exist. Tagging them as such in the XML was not
> done from the beginning though, it was added later in a
> backward-compatible manner which makes the tag more informational than
> something libwayland could automatically process. The foremost example
> is wl_callback.done event. This is only safe because it is guaranteed
> that the client cannot be sending a request on wl_callback at the same
> time the server is sending 'done' and destroying the object:
> wl_callback has no requests defined at all.

Fortunately, my point above about the advantage of the separate ID
ranges helps here.  If wl_callback is created by the client, then a
wl_callback.done event tagged as a destructor does not need
acknowledgement AND is always safe provided that messages involving the
wl_callback ID (other than it's eventual reuse as a new_id arg) are
ignored above libwayland.

But again, this means the destructor tag is important and not merely
informational.

I did notice that the destructor tagging was added mostly (or
solely) to help with code generation by wayland-scanner implementations
in programming languages where destructors require some specific
syntactic notation.

But maybe destructor tagging is even better than that?  Maybe it would
allow libwayland to automate more in a more robust way AND also allow
for middleware that doesn't have to simulate all of the semantic level
interactions induced by protocol messages in order to merely keep track
of how to decode messages.

> 
> It also requires that nothing passes an existing wl_callback object as
> an argument in any request. We have been merely lucky that no-one has
> done that. It's really hard to imagine a use case where you would want
> to pass an existing wl_callback to anything.

Again, the above separate ID ranges point addresses this, I think.

> 
> Extensions may have similar objects that only deliver some one-off
> events and then "self-destruct" by the final event. All this is simply
> documented and not marked in the XML.

That's what I was hoping to avoid.  If there are object types where
object lifetime can only be understood by simulating all of the
relevant semantic content of the messages involved, then that's not
good for middleware.  Isn't it also problematic towards the goals of
libwayland, because it makes it impossible for libwayland to ensure
that messages are properly decoded without trusting that the client
and/or compositor have implemen

Re: Questions about object ID lifetimes

2023-09-14 Thread Pekka Paalanen
On Wed, 13 Sep 2023 20:16:09 -0400
jleivent  wrote:

> Forgive the long post.  Tl;dr: what are the rules of object ID lifetime
> and reuse in the Wayland protocol?

Hi,

congratulations, I think you may have found everything that is not
quite right in the fundamental Wayland protocol design. :-)

As an aside, we collect unfixable issues under
https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next
These are issues that are either impossible or very difficult or
annoying to fix while keeping backward compatibility with both servers
and clients.

--

Object ID re-use is what I would call "aggressive": in the libwayland C
implementation, the object ID last freed is the first one to be
allocated next. There are two separate allocation ranges each with its
own free list: server and client allocated IDs.

The C implementation also poses an additional restriction: a new ID can
be at most the largest ever allocated ID + 1.

All this is to keep the ID map as compact as possible without a hash
table. These details are in the implementation of the private 'struct
wl_map' in libwayland.

> I am attempting to understand the rules of object ID lifetime within
> the Wayland protocol in order to construct Wayland middleware (similar
> to some of the tools featured on
> https://wayland.freedesktop.org/extras.html).  I could not find a
> comprehensive discussion of the details online.  If one exists, I would
> greatly appreciate a link!
> 
> Middleware tools that wish to decode Wayland messages sent between the
> compositor and its clients need to maintain an accurate mapping between
> object ID and object interface (type).  This is needed because the wire
> protocol's message header includes only the target object ID and an
> opcode that is relative to the object's type (the message header also
> includes the message length - about which I also have questions - to be
> pursued later...).  The message (request or event) and its argument
> encoding can only be determined if the object ID -> type and type +
> opcode -> message mappings are accurately maintained.  The type +
> opcode -> message mapping is static and can be extracted offline from
> the protocol XML files.
> 
> Since object IDs can be reused, it is important for the middleware to
> understand when an ID can be reused and when it cannot be to avoid
> errors in the ID -> type mapping.
> 
> Because the Wayland protocol is asynchronous, any message that implies
> destruction of an object should be acknowledged by the receiver before
> the destructed object's ID is reused.
> 
> Fortunately, certain events and requests have been tagged as
> destructors in the protocol descriptions!
> 
> Also fortunately, it appears (based on reading the wl_resource_destroy
> code in wayland-server.c) that for many object IDs, specifically for
> IDs of objects created by a client request (the ID appears as a new ID
> arg of a request, and is thus in the client side of the ID range) and
> for which the client makes a destructor request, the compositor will
> always send a wl_display::delete_id event (assuming the
> display_resource still exists for the client, which apparently would
> only not be the case after the client connection is severed) to
> acknowledge the destructor request. Any attempt to reuse that ID prior
> to the wl_display::delete_id event can lead to confusion, and should be
> avoided.  Reuse of the ID after the wl_display::delete_id event should
> not result in any confusion.
> 
> [BTW: for the purpose of this discussion, an object is "created" when
> it is introduced into a protocol message for the first time via a new_id
> argument.  It does not refer to the actual allocation of the object in
> memory or to its initialization.]

Your whole above analysis is completely correct!

> However, the other cases are not as easy to identify.
> 
> The other cases are:
> 1. an object created by a client request that has destructor events
> 2. an object created by the compositor
> 
> It might be true that case 1 does not exist.  Is there a general rule
> against that such cases would never be considered in future expansions
> of the Wayland protocol?

Destructor events do exist. Tagging them as such in the XML was not
done from the beginning though, it was added later in a
backward-compatible manner which makes the tag more informational than
something libwayland could automatically process. The foremost example
is wl_callback.done event. This is only safe because it is guaranteed
that the client cannot be sending a request on wl_callback at the same
time the server is sending 'done' and destroying the object:
wl_callback has no requests defined at all.

It also requires that nothing passes an existing wl_callback object as
an argument in any request. We have been merely lucky that no-one has
done that. It's really hard to imagine a use case where you would want
to pass an existing wl_callback to anything.

Extensions may have similar objects 

Questions about object ID lifetimes

2023-09-13 Thread jleivent
Forgive the long post.  Tl;dr: what are the rules of object ID lifetime
and reuse in the Wayland protocol?

I am attempting to understand the rules of object ID lifetime within
the Wayland protocol in order to construct Wayland middleware (similar
to some of the tools featured on
https://wayland.freedesktop.org/extras.html).  I could not find a
comprehensive discussion of the details online.  If one exists, I would
greatly appreciate a link!

Middleware tools that wish to decode Wayland messages sent between the
compositor and its clients need to maintain an accurate mapping between
object ID and object interface (type).  This is needed because the wire
protocol's message header includes only the target object ID and an
opcode that is relative to the object's type (the message header also
includes the message length - about which I also have questions - to be
pursued later...).  The message (request or event) and its argument
encoding can only be determined if the object ID -> type and type +
opcode -> message mappings are accurately maintained.  The type +
opcode -> message mapping is static and can be extracted offline from
the protocol XML files.

Since object IDs can be reused, it is important for the middleware to
understand when an ID can be reused and when it cannot be to avoid
errors in the ID -> type mapping.

Because the Wayland protocol is asynchronous, any message that implies
destruction of an object should be acknowledged by the receiver before
the destructed object's ID is reused.

Fortunately, certain events and requests have been tagged as
destructors in the protocol descriptions!

Also fortunately, it appears (based on reading the wl_resource_destroy
code in wayland-server.c) that for many object IDs, specifically for
IDs of objects created by a client request (the ID appears as a new ID
arg of a request, and is thus in the client side of the ID range) and
for which the client makes a destructor request, the compositor will
always send a wl_display::delete_id event (assuming the
display_resource still exists for the client, which apparently would
only not be the case after the client connection is severed) to
acknowledge the destructor request. Any attempt to reuse that ID prior
to the wl_display::delete_id event can lead to confusion, and should be
avoided.  Reuse of the ID after the wl_display::delete_id event should
not result in any confusion.

[BTW: for the purpose of this discussion, an object is "created" when
it is introduced into a protocol message for the first time via a new_id
argument.  It does not refer to the actual allocation of the object in
memory or to its initialization.]

However, the other cases are not as easy to identify.

The other cases are:
1. an object created by a client request that has destructor events
2. an object created by the compositor

It might be true that case 1 does not exist.  Is there a general rule
against that such cases would never be considered in future expansions
of the Wayland protocol?

For objects created by the compositor, there are 2 subcases:

2a. objects with only destructor events
2b. objects with destructor requests

Again, it might be the case that 2b does not exist, as it is analogous
to case 1 above.  But, is there a general rule against such
future cases as well?  Combining 1 and 2b, is there a general rule that
says that only the object creator can initiate an object's destruction
(unprovoked by the other side of the protocol)?

For object IDs created by the compositor and with only destructor
events (case 2a), it may be necessary to understand the details of each
interface in question to decide when the ID can be reused, as there is
no universal destructor acknowledgement request comparable to the
wl_display::delete_id event.  A requirement to understand the details
to that level would make middleware development more difficult.  Insert
extreme sadness emoji here.

Thankfully, it seems that destructor events are themselves
acknowledgements of requests for destruction by the client (such as
wp_drm_lease_device_v1::released event destructor vs.
wp_drm_lease_device_v1::release request), or involve objects with a
very limited lifetime and usage, such as callbacks
(wp_presentation_feedback, zwp_linux_buffer_release, and
zwp_fullscreen_shell_mode_feedback_v1).  These limited lifetime/usage
objects are created with the knowledge that all messages for them are
destructor events, and that they are not involved in any other messages
(as targets or arguments).  Hence their destruction needs no further
acknowledgement because the request for destruction was implied by
their creation.  The destructor event is the acknowledgement of that
request.

Is this a general rule: that a destructor event is is always the
acknowledgement of a (perhaps implied) destruction request?

So there may be two general simple rules that the middleware can follow
to maintain a proper ID -> type mapping through ID reuse cycles:

1. reuse of ID is allowed after w