On Wed, 14 Sep 2016 18:20:10 +0200
Olivier Fourdan <ofour...@redhat.com> wrote:

> wl_display_flush() can fail with EAGAIN and Xwayland would make this a
> fatal error.
> 
> When this happens, it means that Xwayland has flooded the Wayland file
> descriptor, either because the Wayland compositor cannot cope or more
> likely because of a deadlock situation where the Wayland compositor is
> blocking, waiting for an X reply while Xwayland tries to write data to
> the Wayland file descriptor.
> 
> The general consensus to avoid the deadlock is for the Wayland
> compositor to never issue blocking X11 roundtrips, but in practice
> blocking rountrips can occur in various places, including Xlib calls
> themselves so this is not always achievable without major surgery in the
> Wayland compositor/Window manager.

Hi Olivier,

all the rationale in this commit message looks good to me, just some
things I'd like to ask more about.

> What this patch does is to avoid dispatching to the Wayland file
> descriptor until it becomes available for writing again, while at the
> same time continue processing X11 requests to release the deadlock.

Could you expand on why the Wayland fd should not be dispatched while
you wait for it to flush?

Is it just because dispatching it is likely to cause more Wayland
requests to be sent? I wonder how that holds up. Maybe it doesn't
matter as the premise is that the Wayland compositor is stuck.

> This is not perfect, as there is still the possibility of another X
> client hammering the connection and we'll still fail writing to the
> Wayland connection eventually, but this improves things enough to avoid
> a 100% repeatable crash with vlc and gtkperf.
> 
> Also, it is worth considering that window managers and Wayland
> compositors such as mutter already have a higher priority than other
> regular X clients thanks to XSyncSetPriority(), mitigating the risk.

The blocking X11 calls from the Wayland compositor - are they all of
the nature that the X server will reply to them immediately when it
reads them, or could the reply be delayed allowing other X11 clients to
be serviced in the mean time?

If all replies are immediate, it sounds like this would be a fairly
reliable fix after all.


Thanks,
pq

> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1278159
> Bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=763400
> Signed-off-by: Olivier Fourdan <ofour...@redhat.com>
> ---
>  v2: oops, need to poll() on EAGAIN between retries
>  v3: Mostly a rewrite, non-blocking on poll()
> 
>  hw/xwayland/xwayland.c | 34 +++++++++++++++++++++++++++++++---
>  hw/xwayland/xwayland.h |  1 +
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/xwayland/xwayland.c b/hw/xwayland/xwayland.c
> index 847321e..2efbff8 100644
> --- a/hw/xwayland/xwayland.c
> +++ b/hw/xwayland/xwayland.c
> @@ -33,6 +33,7 @@
>  #include <compositeext.h>
>  #include <glx_extinit.h>
>  #include <os.h>
> +#include <xserver_poll.h>
>  
>  #ifdef XF86VIDMODE
>  #include <X11/extensions/xf86vmproto.h>
> @@ -470,6 +471,9 @@ xwl_read_events (struct xwl_screen *xwl_screen)
>  {
>      int ret;
>  
> +    if (xwl_screen->wait_flush)
> +        return;
> +
>      ret = wl_display_read_events(xwl_screen->display);
>      if (ret == -1)
>          FatalError("failed to read Wayland events: %s\n", strerror(errno));
> @@ -481,10 +485,25 @@ xwl_read_events (struct xwl_screen *xwl_screen)
>          FatalError("failed to dispatch Wayland events: %s\n", 
> strerror(errno));
>  }
>  
> +static int
> +xwl_display_pollout (struct xwl_screen *xwl_screen, int timeout)
> +{
> +    struct pollfd poll_fd;
> +
> +    poll_fd.fd = wl_display_get_fd(xwl_screen->display);
> +    poll_fd.events = POLLOUT;
> +
> +    return xserver_poll(&poll_fd, 1, timeout);
> +}
> +
>  static void
>  xwl_dispatch_events (struct xwl_screen *xwl_screen)
>  {
> -    int ret;
> +    int ret = 0;
> +    int ready;
> +
> +    if (xwl_screen->wait_flush)
> +        goto pollout;
>  
>      while (xwl_screen->prepare_read == 0 &&
>             wl_display_prepare_read(xwl_screen->display) == -1) {
> @@ -496,9 +515,18 @@ xwl_dispatch_events (struct xwl_screen *xwl_screen)
>  
>      xwl_screen->prepare_read = 1;
>  
> -    ret = wl_display_flush(xwl_screen->display);
> -    if (ret == -1)
> +pollout:
> +    ready = xwl_display_pollout(xwl_screen, 5);
> +    if (ready == -1)
> +        FatalError("error polling on XWayland fd: %s\n", strerror(errno));
> +
> +    if (ready > 0)
> +        ret = wl_display_flush(xwl_screen->display);
> +
> +    if (ret == -1 && errno != EAGAIN)
>          FatalError("failed to write to XWayland fd: %s\n", strerror(errno));
> +
> +    xwl_screen->wait_flush = (ready == 0 || ret == -1);
>  }
>  
>  static void
> diff --git a/hw/xwayland/xwayland.h b/hw/xwayland/xwayland.h
> index db3dd0b..3ce7a63 100644
> --- a/hw/xwayland/xwayland.h
> +++ b/hw/xwayland/xwayland.h
> @@ -83,6 +83,7 @@ struct xwl_screen {
>  #define XWL_FORMAT_RGB565   (1 << 2)
>  
>      int prepare_read;
> +    int wait_flush;
>  
>      char *device_name;
>      int drm_fd;

Attachment: pgpVPwUGv89YM.pgp
Description: OpenPGP digital signature

_______________________________________________
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to