[Touch-packages] [Bug 1782984] Re: Assertion `!xcb_xlib_threads_sequence_lost' failed with multiple applications

Timo Aaltonen Fri, 13 Aug 2021 06:20:54 -0700

** Description changed:

+ [Impact]
+ 
+ There is a race in libx11 causing applications to randomly abort. It's
+ not trivial to reproduce, but there are enough duplicates that this
+ deserves an SRU to bionic & focal.
+ 
+ [Fix]
+ 
+ Backport a commit from upstream:
+ 
+ From dbb55e1a5e82870466b095097d9e46046680ec25 Mon Sep 17 00:00:00 2001
+ From: Frediano Ziglio <fzig...@redhat.com>
+ Date: Wed, 29 Jan 2020 09:06:54 +0000
+ Subject: [PATCH] Fix poll_for_response race condition
+ 
+ In poll_for_response is it possible that event replies are skipped
+ and a more up to date message reply is returned.
+ This will cause next poll_for_event call to fail aborting the program.
+ 
+ This was proved using some slow ssh tunnel or using some program
+ to slow down server replies (I used a combination of xtrace and strace).
+ 
+ How the race happens:
+ - program enters into poll_for_response;
+ - poll_for_event is called but the server didn't still send the reply;
+ - pending_requests is not NULL because we send a request (see call
+   to  append_pending_request in _XSend);
+ - xcb_poll_for_reply64 is called from poll_for_response;
+ - xcb_poll_for_reply64 will read from server, at this point
+   server reply with an event (say sequence N) and the reply to our
+   last request (say sequence N+1);
+ - xcb_poll_for_reply64 returns the reply for the request we asked;
+ - last_request_read is set to N+1 sequence in poll_for_response;
+ - poll_for_response returns the response to the request;
+ - poll_for_event is called (for instance from another poll_for_response);
+ - event with sequence N is retrieved;
+ - the N sequence is widen, however, as the "new" number computed from
+   last_request_read is less than N the number is widened to N + 2^32
+   (assuming last_request_read is still contained in 32 bit);
+ - poll_for_event enters the nested if statement as req is NULL;
+ - we compare the widen N (which now does not fit into 32 bit) with
+   request (which fits into 32 bit) hitting the throw_thread_fail_assert.
+ 
+ To avoid the race condition and to avoid the sequence to go back
+ I check again for new events after getting the response and
+ return this last event if present saving the reply to return it
+ later.
+ 
+ To test the race and the fix it's helpful to add a delay (I used a
+ "usleep(5000)") before calling xcb_poll_for_reply64.
+ 
+ Original patch written by Frediano Ziglio, see
+ https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/34
+ 
+ Reworked primarily for readability by Peter Hutterer, see
+ https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/53
+ 
+ Signed-off-by: Peter Hutterer <peter.hutte...@who-t.net>
+ 
+ bionic needs another commit so that the real fix applies.
+ 
+ [Test case]
+ 
+ It's a race condition, the SRU sponsor (tjaalton) does not have a test
+ case for this, but the bug subscribers seem to.
+ 
+ 
+ [Where things could go wrong]
+ 
+ In theory there might be a case where a race still happens, but since
+ this has been upstream for a year now with no follow-up commits, it's
+ safe to assume that there are no regressions.
+ 
+ 
+ --
+ 
  STEPS TO REPRODUCE
  ==================
  The bug seems to occur when clicking on a file or folder. It is random and 
difficult to provide clear steps to reproduce. It is, however, a common 
situation.
  
  EXPECTED RESULTS
  ================
  pcmanfm works without problem.
  
  ACTUAL RESULTS
  ==============
  All pcmanfm windows become unresponsive, though background processes (e.g. 
copying) may continue without problem. with the same error message in 
~/.cache/lxsession/LXDE/run.log:
  
  [xcb] Unknown sequence number while processing queue
  [xcb] Most likely this is a multi-threaded client and XInitThreads has not 
been called
  [xcb] Aborting, sorry about that.
  pcmanfm: xcb_io.c:259: poll_for_event: Assertion 
`!xcb_xlib_threads_sequence_lost' failed.
  ** Message: 19:58:49.267: app.vala:130: pcmanfm exit with this type of exit: 6
  ** Message: 19:58:49.268: app.vala:148: Exit not normal, try to reload
  
  (note the timestamp on the message will vary)
  
  AFFECTED VERSIONS
  =================
  1.2.5-3ubuntu1
  NOT 1.2.4-1ubuntu0.1
  
  UPSTREAM BUG
  ============
  https://sourceforge.net/p/pcmanfm/bugs/1089/
  
  ADDITIONAL NOTES
  ================
  Other GTK2 file managers (e.g. Thunar) and applications (e.g. GIMP, Leafpad) 
seem to have the same problems. This is probably at least rooted in a GTK2 bug:
  https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bug/1808710
  
  To further assert this, note that there is a SpaceFM file manager that
  is available in GTK2 and GTK3. The GTK2 version displays the behavior.
  The GTK3 version does not. Same with LibreOffice.


-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libx11 in Ubuntu.
https://bugs.launchpad.net/bugs/1782984

Title:
  Assertion `!xcb_xlib_threads_sequence_lost' failed with multiple
  applications

Status in libx11 package in Ubuntu:
  Fix Released
Status in libx11 source package in Bionic:
  In Progress
Status in libx11 source package in Focal:
  In Progress
Status in libx11 source package in Groovy:
  Won't Fix

Bug description:
  [Impact]

  There is a race in libx11 causing applications to randomly abort. It's
  not trivial to reproduce, but there are enough duplicates that this
  deserves an SRU to bionic & focal.

  [Fix]

  Backport a commit from upstream:

  From dbb55e1a5e82870466b095097d9e46046680ec25 Mon Sep 17 00:00:00 2001
  From: Frediano Ziglio <fzig...@redhat.com>
  Date: Wed, 29 Jan 2020 09:06:54 +0000
  Subject: [PATCH] Fix poll_for_response race condition

  In poll_for_response is it possible that event replies are skipped
  and a more up to date message reply is returned.
  This will cause next poll_for_event call to fail aborting the program.

  This was proved using some slow ssh tunnel or using some program
  to slow down server replies (I used a combination of xtrace and strace).

  How the race happens:
  - program enters into poll_for_response;
  - poll_for_event is called but the server didn't still send the reply;
  - pending_requests is not NULL because we send a request (see call
    to  append_pending_request in _XSend);
  - xcb_poll_for_reply64 is called from poll_for_response;
  - xcb_poll_for_reply64 will read from server, at this point
    server reply with an event (say sequence N) and the reply to our
    last request (say sequence N+1);
  - xcb_poll_for_reply64 returns the reply for the request we asked;
  - last_request_read is set to N+1 sequence in poll_for_response;
  - poll_for_response returns the response to the request;
  - poll_for_event is called (for instance from another poll_for_response);
  - event with sequence N is retrieved;
  - the N sequence is widen, however, as the "new" number computed from
    last_request_read is less than N the number is widened to N + 2^32
    (assuming last_request_read is still contained in 32 bit);
  - poll_for_event enters the nested if statement as req is NULL;
  - we compare the widen N (which now does not fit into 32 bit) with
    request (which fits into 32 bit) hitting the throw_thread_fail_assert.

  To avoid the race condition and to avoid the sequence to go back
  I check again for new events after getting the response and
  return this last event if present saving the reply to return it
  later.

  To test the race and the fix it's helpful to add a delay (I used a
  "usleep(5000)") before calling xcb_poll_for_reply64.

  Original patch written by Frediano Ziglio, see
  https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/34

  Reworked primarily for readability by Peter Hutterer, see
  https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/53

  Signed-off-by: Peter Hutterer <peter.hutte...@who-t.net>

  bionic needs another commit so that the real fix applies.

  [Test case]

  It's a race condition, the SRU sponsor (tjaalton) does not have a test
  case for this, but the bug subscribers seem to.

  
  [Where things could go wrong]

  In theory there might be a case where a race still happens, but since
  this has been upstream for a year now with no follow-up commits, it's
  safe to assume that there are no regressions.

  
  --

  STEPS TO REPRODUCE
  ==================
  The bug seems to occur when clicking on a file or folder. It is random and 
difficult to provide clear steps to reproduce. It is, however, a common 
situation.

  EXPECTED RESULTS
  ================
  pcmanfm works without problem.

  ACTUAL RESULTS
  ==============
  All pcmanfm windows become unresponsive, though background processes (e.g. 
copying) may continue without problem. with the same error message in 
~/.cache/lxsession/LXDE/run.log:

  [xcb] Unknown sequence number while processing queue
  [xcb] Most likely this is a multi-threaded client and XInitThreads has not 
been called
  [xcb] Aborting, sorry about that.
  pcmanfm: xcb_io.c:259: poll_for_event: Assertion 
`!xcb_xlib_threads_sequence_lost' failed.
  ** Message: 19:58:49.267: app.vala:130: pcmanfm exit with this type of exit: 6
  ** Message: 19:58:49.268: app.vala:148: Exit not normal, try to reload

  (note the timestamp on the message will vary)

  AFFECTED VERSIONS
  =================
  1.2.5-3ubuntu1
  NOT 1.2.4-1ubuntu0.1

  UPSTREAM BUG
  ============
  https://sourceforge.net/p/pcmanfm/bugs/1089/

  ADDITIONAL NOTES
  ================
  Other GTK2 file managers (e.g. Thunar) and applications (e.g. GIMP, Leafpad) 
seem to have the same problems. This is probably at least rooted in a GTK2 bug:
  https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bug/1808710

  To further assert this, note that there is a SpaceFM file manager that
  is available in GTK2 and GTK3. The GTK2 version displays the behavior.
  The GTK3 version does not. Same with LibreOffice.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libx11/+bug/1782984/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

[Touch-packages] [Bug 1782984] Re: Assertion `!xcb_xlib_threads_sequence_lost' failed with multiple applications

Reply via email to