Re: why not flow control in wl_connection_flush?
On Fri, 1 Mar 2024 11:59:36 +0200 Pekka Paalanen wrote: > ... > The real problem here is though, how do you architect your app or > toolkit so that it can stop and postpone what it is doing with Wayland > when you detect that the socket is not draining fast enough? You might > be calling into Wayland using libraries that do not support this. > > Returning to the main event loop is the natural place to check and > postpone, but this whole issue stems from the reality that apps do not > return to their main loop often enough or do not know how to wait with > sending even in the main loop. I am concluding from this discussion that I don't think clients would be constructed not to cause problems if they attempt to send too fast. I think I may add an option to wl_connection_flush in my copy of libwayland so that I can turn on client waiting on flush from an env var. It looks like it the change would be pretty small. Unless you think it would be worth making this a MR on its own? If the client is single threaded, this will cause the whole client to wait, which probably won't be a problem, considering the type of clients that might try to be that fast. If the client isn't single threaded, then it may cause a thread to wait that the client doesn't expect to wait, which could be a problem for that client, admittedly.
Re: why not flow control in wl_connection_flush?
On Thu, 29 Feb 2024 16:32:26 -0500 jleivent wrote: > On Tue, 27 Feb 2024 11:01:18 +0200 > Pekka Paalanen wrote: > > > > > But suppose I'm writing a client that has the possibility of > > > sending a rapid stream of msgs to the compositor that might be, > > > depending on what other clients are doing, too fast for the > > > compositor to handle, and I'd like to implement some flow control. > > > I don't want the connection to the compositor to sever or for the > > > condition to cause memory consumption without any ability for me to > > > find out about and control the situation. Especially if I could > > > slow down that rapid stream of updates without too high a cost to > > > what my client is trying to accomplish. > > > > > > Is there a way I could do that? > > > > Get the Wayland fd with wl_display_get_fd() and poll it for writable. > > If it's not writable, you're sending too fast. > > That's what I was looking for! I think... Maybe? > > > > > That's what any client should always do. Usually it would be prompted > > by wl_display_flush() erroring out with EAGAIN as your cue to start > > polling for writable. It's even documented. > > But, calling wl_display_flush too often is bad for throughput, right? I don't think it is if you're not calling it in a busy-loop? Have a look through its implementation. If you mean calling after every single request sending, then yeah, I would not do that, even though I don't think it would have measurable performance impact. Power consumption impact might be measurable. There is a wl_display mutex though, so if you hammer it from many threads, I guess it could hurt. > Isn't it better to allow the ring buffer to determine itself when to > flush based on being full, and batch send many msgs? Obviously > sometimes the client has nothing more to send (for a while), so > wl_display_flush then makes sense. But in this case, it does have more > to send and wants to know if it should attempt to do so or hold back. Normally, your program returns to its main event loop relatively fast. Definitely between each drawn frame if we consider just one wl_surface at a time. That is the natural place and granularity to flush. You must flush before polling the Wayland fd for readable, because otherwise you might be waiting forever for replies that never come because the compositor never got your request yet. The automatic flush on the libwayland send buffer (which is 4 kB so far) getting full is just to buy you time to do your work and return to your main event loop to flush and sleep. This flush is the problematic one, because if it fails, there is no recovery at this time. We cannot return a "hold up, do something else for a while instead of sending requests", and there is no place to put the current request, so the only way out is to disconnect. The socket buffer usually defaulting to something like 200 kB is a saviour, I believe. > I could instead wait for the display fd to be writable before > attempting each msg send. But the display fd may be writable merely > because the ring buffer hasn't tried flushing yet. But the ring buffer > could have less than enough space for the msg I'm about to send. And > the socket buffer could have very little space - just enough for it to > say its writable. > > Which means that sometimes polling the display fd will return writable > when an attempt to send a msg is still going to result in ring buffer > growth or client disconnect. Right. So you need to try to flush first to ensure the libwayland ring buffer is empty and can hold your message. You cannot send messages larger than the ring buffer anyway. > So... back to calling wl_display_flush ... sometimes. > > I guess I could call wl_display_flush after what I think is about 4K > worth of msg content. Wl_display_flush returns the amount sent, so > that helps keep the extra state I need to maintain. > > Is there currently a way I could get the size of contents in the output > ring buffer? I don't recall any. The MRs for solving this problem have considered adding API for the app to see when the ring buffer is getting a little too full. The real problem here is though, how do you architect your app or toolkit so that it can stop and postpone what it is doing with Wayland when you detect that the socket is not draining fast enough? You might be calling into Wayland using libraries that do not support this. Returning to the main event loop is the natural place to check and postpone, but this whole issue stems from the reality that apps do not return to their main loop often enough or do not know how to wait with sending even in the main loop. Thanks, pq pgpKYz7so5Jd8.pgp Description: OpenPGP digital signature
Re: why not flow control in wl_connection_flush?
On Tue, 27 Feb 2024 11:01:18 +0200 Pekka Paalanen wrote: > > But suppose I'm writing a client that has the possibility of > > sending a rapid stream of msgs to the compositor that might be, > > depending on what other clients are doing, too fast for the > > compositor to handle, and I'd like to implement some flow control. > > I don't want the connection to the compositor to sever or for the > > condition to cause memory consumption without any ability for me to > > find out about and control the situation. Especially if I could > > slow down that rapid stream of updates without too high a cost to > > what my client is trying to accomplish. > > > > Is there a way I could do that? > > Get the Wayland fd with wl_display_get_fd() and poll it for writable. > If it's not writable, you're sending too fast. That's what I was looking for! I think... Maybe? > > That's what any client should always do. Usually it would be prompted > by wl_display_flush() erroring out with EAGAIN as your cue to start > polling for writable. It's even documented. But, calling wl_display_flush too often is bad for throughput, right? Isn't it better to allow the ring buffer to determine itself when to flush based on being full, and batch send many msgs? Obviously sometimes the client has nothing more to send (for a while), so wl_display_flush then makes sense. But in this case, it does have more to send and wants to know if it should attempt to do so or hold back. I could instead wait for the display fd to be writable before attempting each msg send. But the display fd may be writable merely because the ring buffer hasn't tried flushing yet. But the ring buffer could have less than enough space for the msg I'm about to send. And the socket buffer could have very little space - just enough for it to say its writable. Which means that sometimes polling the display fd will return writable when an attempt to send a msg is still going to result in ring buffer growth or client disconnect. So... back to calling wl_display_flush ... sometimes. I guess I could call wl_display_flush after what I think is about 4K worth of msg content. Wl_display_flush returns the amount sent, so that helps keep the extra state I need to maintain. Is there currently a way I could get the size of contents in the output ring buffer?
Re: why not flow control in wl_connection_flush?
On Mon, 26 Feb 2024 13:23:01 -0500 jleivent wrote: > On Mon, 26 Feb 2024 15:12:23 +0200 > Pekka Paalanen wrote: > > ... > > > What is the advantage to having the impacted clients grow their send > > > buffers while waiting? They wait either way. > > > > They are not waiting if they are growing their send buffers. > > I meant that they must wait for the UI to update corresponding to the > messages they are trying to send to the compositor. > > This may as also be about my assumption of a threading model, this time > for the client. I assume that a client that has some important work to > do that is unrelated to updating the display will do that work in a > distinct thread from the one dedicated to sending display related msgs > to the compositor. That's what they definitely should do, if their other work ends up blocking the thread. But if their other work is fast enough, then we have no assumptions of using threads. Given lots of software is written in C where threading creates more problems than it solves, allowing apps to be single-threaded is a good idea to me. > If that's not the case, then indeed causing the client's sending thread > to wait could impact some other computation. Which might be bad, > depending on what that other computation is trying to do. Many (most?) games and game framework libraries are prime examples of difficult requirements. They can be "single"-threaded, which means we should not block them, but at the same time their main loop may be a busy loop of drawing new frames with very specific expectations on when and where the window system should be blocking them to throttle their framerate if the game so wants. Sometimes users do not want throttling, but they will get it anyway, because e.g. EGL and Vulkan have a limit on the number of framebuffers per surface, so when you run out, the app gets blocked anyway waiting for some buffer to be released from the compositor before drawing can continue. You can also write an app such that there is no limit on the number of framebuffers, but then you risk OOM instead. > But suppose I'm writing a client that has the possibility of sending a > rapid stream of msgs to the compositor that might be, depending on what > other clients are doing, too fast for the compositor to handle, and I'd > like to implement some flow control. I don't want the connection to > the compositor to sever or for the condition to cause memory > consumption without any ability for me to find out about and control > the situation. Especially if I could slow down that rapid stream of > updates without too high a cost to what my client is trying to > accomplish. > > Is there a way I could do that? Get the Wayland fd with wl_display_get_fd() and poll it for writable. If it's not writable, you're sending too fast. That's what any client should always do. Usually it would be prompted by wl_display_flush() erroring out with EAGAIN as your cue to start polling for writable. It's even documented. I haven't followed the details of the libwayland message buffering MR, but I would expect it to give you more grace time to poll, so you don't have to go out of your way to check at inconvenient times. And then there are the clients who don't care to poll for writable at all. Thanks, pq pgpQLS4G22et6.pgp Description: OpenPGP digital signature
Re: why not flow control in wl_connection_flush?
On Mon, 26 Feb 2024 15:12:23 +0200 Pekka Paalanen wrote: ... > > What is the advantage to having the impacted clients grow their send > > buffers while waiting? They wait either way. > > They are not waiting if they are growing their send buffers. I meant that they must wait for the UI to update corresponding to the messages they are trying to send to the compositor. This may as also be about my assumption of a threading model, this time for the client. I assume that a client that has some important work to do that is unrelated to updating the display will do that work in a distinct thread from the one dedicated to sending display related msgs to the compositor. If that's not the case, then indeed causing the client's sending thread to wait could impact some other computation. Which might be bad, depending on what that other computation is trying to do. But suppose I'm writing a client that has the possibility of sending a rapid stream of msgs to the compositor that might be, depending on what other clients are doing, too fast for the compositor to handle, and I'd like to implement some flow control. I don't want the connection to the compositor to sever or for the condition to cause memory consumption without any ability for me to find out about and control the situation. Especially if I could slow down that rapid stream of updates without too high a cost to what my client is trying to accomplish. Is there a way I could do that?
Re: why not flow control in wl_connection_flush?
On Sat, 24 Feb 2024 15:35:27 -0500 jleivent wrote: > On Fri, 23 Feb 2024 12:12:38 +0200 > Pekka Paalanen wrote: > > > > I would think it to be quite difficult for a compositor to dedicate a > > whole thread for each client. > > But that means it is possible that the server cannot catch up for long > periods. And that just having a large number of otherwise friendly > clients can cause their socket buffers to fill up. And things are > worse on systems with more cores. That can be true. Let me know when you hit that. It will be interesting to hear what it takes to happen. Even more interesting will be to hear if threading the client connection handling will actually help, or is the compositor choking on its own operations like repaint that need a snapshot of all clients' state, because there are simply too many surfaces up. > What is the advantage to having the impacted clients grow their send > buffers while waiting? They wait either way. They are not waiting if they are growing their send buffers. Thanks, pq pgpt4aYlfB0Ig.pgp Description: OpenPGP digital signature
Re: why not flow control in wl_connection_flush?
On Fri, 23 Feb 2024 12:12:38 +0200 Pekka Paalanen wrote: > I would think it to be quite difficult for a compositor to dedicate a > whole thread for each client. But that means it is possible that the server cannot catch up for long periods. And that just having a large number of otherwise friendly clients can cause their socket buffers to fill up. And things are worse on systems with more cores. What is the advantage to having the impacted clients grow their send buffers while waiting? They wait either way.
Re: why not flow control in wl_connection_flush?
On Thu, 22 Feb 2024 10:26:00 -0500 jleivent wrote: > Thanks for this response. I am considering adding unbounded buffering > to my Wayland middleware project, and wanted to consider the flow > control options first. Walking through the reasonsing here is very > helpful. I didn't know that there was a built-in expectation that > clients would do some of their own flow control. I was also operating > under the assumption that blocking flushes from the compositor to > one client would not have an impact on other clients (was assuming an > appropriate threading model in compositors). I would think it to be quite difficult for a compositor to dedicate a whole thread for each client. There would be times like repainting an output, where you would need to freeze or snapshot all relevant surfaces' state, blocking the handling of client requests. I'm sure it could be done, but due to the complexity a highly threaded design would cause, most compositors to my perception have just opted for an approximately single-threaded design, maybe not least because Wayland explicitly aims to support that. Wayland requests are intended to be very fast to serve, so threading them per client should not be necessary. > The client OOM issue, though: A malicious client can do all kinds of > things to try to get DoS, and moving towards OOM would accomplish that > as well on systems with sufficient speed disadvantages for thrashing. > A buggy client that isn't trying to do anything malicious, but is > trapped in a send loop, that would be a case where causing it to wait > might be better than allowing it to move towards OOM (and thrash). Where do you draw the line between being "stuck in a loop", "doing something stupid but still workable and legit", and "doing what it simply needs to do"? One example of an innocent client overflowing its sending is Xwayland where X11 clients cause wl_surface damage in an extremely fragmented way. It might result in thousands of tiny damage rectangles, and it could happen in multiple X11 windows simultaneously. If all that damage was relayed as-is, it is very possible the Wayland socket would overflow. (To work around that, there is a limit in Xwayland on how many rects it is willing forward, and when that is exceeded, it falls back to a single bounding-box of damage, IIRC.) Blocking might be ok for Xwayland, perhaps, so not the best example in that sense. A client could also be trapped in an unthrottled repaint loop, where it allocates pixel buffers without a limit because a compositor is not releasing them as fast, and the general idea is that if you need to draw and you don't have a free pixel buffer, you allocate a new one. It's up to the client to limit itself to a reasonable number of pixel buffers per surface, and that number is not 2. It's probably 4 usually. A reasonable number could be even more, depending. Thanks, pq > On Thu, 22 Feb 2024 11:52:28 +0200 > Pekka Paalanen wrote: > > > On Wed, 21 Feb 2024 11:08:02 -0500 > > jleivent wrote: > > > > > Not completely blocking makes sense for the compositor, but why not > > > block the client? > > > > Blocking in clients is indeed less of a problem, but: > > > > - Clients do not usually have requests they *have to* send to the > > compositor even if the compositor is not responding timely, unlike > > input events that compositors have; a client can spam surfaces all > > it wants, but it is just throwing work away if it does it faster than > > the screen can update. So there is some built-in expectation that > > clients control their sending. > > > > - I think the Wayland design wants to give full asynchronicity for > > clients as well, never blocking them unless they explicitly choose > > to wait for an event. A client might have semi-real-time > > responsibilities as well. > > > > - A client's send buffer could be infinite. If a client chooses to > > send requests so fast it hits OOM, it is just DoS'ing itself. > > > > > For the compositor, wouldn't a timeout in the sendmsg make sense? > > > > That would make both problems: slight blocking multiplied by number of > > (stalled) clients, and overflows. That could lead to jittery user > > experience while not eliminating the overflow problem. > > > > > > Thanks, > > pq > > pgp6yRoItQM_Z.pgp Description: OpenPGP digital signature
Re: why not flow control in wl_connection_flush?
Thanks for this response. I am considering adding unbounded buffering to my Wayland middleware project, and wanted to consider the flow control options first. Walking through the reasonsing here is very helpful. I didn't know that there was a built-in expectation that clients would do some of their own flow control. I was also operating under the assumption that blocking flushes from the compositor to one client would not have an impact on other clients (was assuming an appropriate threading model in compositors). The client OOM issue, though: A malicious client can do all kinds of things to try to get DoS, and moving towards OOM would accomplish that as well on systems with sufficient speed disadvantages for thrashing. A buggy client that isn't trying to do anything malicious, but is trapped in a send loop, that would be a case where causing it to wait might be better than allowing it to move towards OOM (and thrash). On Thu, 22 Feb 2024 11:52:28 +0200 Pekka Paalanen wrote: > On Wed, 21 Feb 2024 11:08:02 -0500 > jleivent wrote: > > > Not completely blocking makes sense for the compositor, but why not > > block the client? > > Blocking in clients is indeed less of a problem, but: > > - Clients do not usually have requests they *have to* send to the > compositor even if the compositor is not responding timely, unlike > input events that compositors have; a client can spam surfaces all > it wants, but it is just throwing work away if it does it faster than > the screen can update. So there is some built-in expectation that > clients control their sending. > > - I think the Wayland design wants to give full asynchronicity for > clients as well, never blocking them unless they explicitly choose > to wait for an event. A client might have semi-real-time > responsibilities as well. > > - A client's send buffer could be infinite. If a client chooses to > send requests so fast it hits OOM, it is just DoS'ing itself. > > > For the compositor, wouldn't a timeout in the sendmsg make sense? > > That would make both problems: slight blocking multiplied by number of > (stalled) clients, and overflows. That could lead to jittery user > experience while not eliminating the overflow problem. > > > Thanks, > pq >
Re: why not flow control in wl_connection_flush?
On Wed, 21 Feb 2024 11:08:02 -0500 jleivent wrote: > Not completely blocking makes sense for the compositor, but why not > block the client? Blocking in clients is indeed less of a problem, but: - Clients do not usually have requests they *have to* send to the compositor even if the compositor is not responding timely, unlike input events that compositors have; a client can spam surfaces all it wants, but it is just throwing work away if it does it faster than the screen can update. So there is some built-in expectation that clients control their sending. - I think the Wayland design wants to give full asynchronicity for clients as well, never blocking them unless they explicitly choose to wait for an event. A client might have semi-real-time responsibilities as well. - A client's send buffer could be infinite. If a client chooses to send requests so fast it hits OOM, it is just DoS'ing itself. > For the compositor, wouldn't a timeout in the sendmsg make sense? That would make both problems: slight blocking multiplied by number of (stalled) clients, and overflows. That could lead to jittery user experience while not eliminating the overflow problem. Thanks, pq > On Wed, 21 Feb 2024 16:39:08 +0100 > Olivier Fourdan wrote: > > > Hi, > > > > On Wed, Feb 21, 2024 at 4:21 PM jleivent wrote: > > > > > I've been looking into some of the issues about allowing the > > > socket's kernel buffer to run out of space, and was wondering why > > > not simply remove MSG_DONTWAIT from the sendmsg call in > > > wl_connection_flush? That should implement flow control by having > > > the sender thread wait until the receiver has emptied the socket's > > > buffer sufficiently. > > > > > > It seems to me that using an unbounded buffer could cause memory > > > resource problems on whichever end was using that buffer. > > > > > > Was removing MSG_DONTWAIT from the sendmsg call considered and > > > abandoned for some reason? > > > > > > > See this thread [1] from 2012, it might give some hint on why > > MSG_DONTWAIT was added with commit b26774da [2]. > > > > HTH > > Olivier > > > > [1] > > https://lists.freedesktop.org/archives/wayland-devel/2012-February/002394.html > > [2] https://gitlab.freedesktop.org/wayland/wayland/-/commit/b26774da > pgp6eosZRij9S.pgp Description: OpenPGP digital signature
Re: why not flow control in wl_connection_flush?
Not completely blocking makes sense for the compositor, but why not block the client? For the compositor, wouldn't a timeout in the sendmsg make sense? On Wed, 21 Feb 2024 16:39:08 +0100 Olivier Fourdan wrote: > Hi, > > On Wed, Feb 21, 2024 at 4:21 PM jleivent wrote: > > > I've been looking into some of the issues about allowing the > > socket's kernel buffer to run out of space, and was wondering why > > not simply remove MSG_DONTWAIT from the sendmsg call in > > wl_connection_flush? That should implement flow control by having > > the sender thread wait until the receiver has emptied the socket's > > buffer sufficiently. > > > > It seems to me that using an unbounded buffer could cause memory > > resource problems on whichever end was using that buffer. > > > > Was removing MSG_DONTWAIT from the sendmsg call considered and > > abandoned for some reason? > > > > See this thread [1] from 2012, it might give some hint on why > MSG_DONTWAIT was added with commit b26774da [2]. > > HTH > Olivier > > [1] > https://lists.freedesktop.org/archives/wayland-devel/2012-February/002394.html > [2] https://gitlab.freedesktop.org/wayland/wayland/-/commit/b26774da
Re: why not flow control in wl_connection_flush?
Hi, On Wed, Feb 21, 2024 at 4:21 PM jleivent wrote: > I've been looking into some of the issues about allowing the socket's > kernel buffer to run out of space, and was wondering why not simply > remove MSG_DONTWAIT from the sendmsg call in wl_connection_flush? That > should implement flow control by having the sender thread wait until > the receiver has emptied the socket's buffer sufficiently. > > It seems to me that using an unbounded buffer could cause memory > resource problems on whichever end was using that buffer. > > Was removing MSG_DONTWAIT from the sendmsg call considered and abandoned > for some reason? > See this thread [1] from 2012, it might give some hint on why MSG_DONTWAIT was added with commit b26774da [2]. HTH Olivier [1] https://lists.freedesktop.org/archives/wayland-devel/2012-February/002394.html [2] https://gitlab.freedesktop.org/wayland/wayland/-/commit/b26774da
why not flow control in wl_connection_flush?
I've been looking into some of the issues about allowing the socket's kernel buffer to run out of space, and was wondering why not simply remove MSG_DONTWAIT from the sendmsg call in wl_connection_flush? That should implement flow control by having the sender thread wait until the receiver has emptied the socket's buffer sufficiently. It seems to me that using an unbounded buffer could cause memory resource problems on whichever end was using that buffer. Was removing MSG_DONTWAIT from the sendmsg call considered and abandoned for some reason?