** Description changed: + [Impact] + + There is a race in libx11 causing applications to randomly abort. It's + not trivial to reproduce, but there are enough duplicates that this + deserves an SRU to bionic & focal. + + [Fix] + + Backport a commit from upstream: + + From dbb55e1a5e82870466b095097d9e46046680ec25 Mon Sep 17 00:00:00 2001 + From: Frediano Ziglio <fzig...@redhat.com> + Date: Wed, 29 Jan 2020 09:06:54 +0000 + Subject: [PATCH] Fix poll_for_response race condition + + In poll_for_response is it possible that event replies are skipped + and a more up to date message reply is returned. + This will cause next poll_for_event call to fail aborting the program. + + This was proved using some slow ssh tunnel or using some program + to slow down server replies (I used a combination of xtrace and strace). + + How the race happens: + - program enters into poll_for_response; + - poll_for_event is called but the server didn't still send the reply; + - pending_requests is not NULL because we send a request (see call + to append_pending_request in _XSend); + - xcb_poll_for_reply64 is called from poll_for_response; + - xcb_poll_for_reply64 will read from server, at this point + server reply with an event (say sequence N) and the reply to our + last request (say sequence N+1); + - xcb_poll_for_reply64 returns the reply for the request we asked; + - last_request_read is set to N+1 sequence in poll_for_response; + - poll_for_response returns the response to the request; + - poll_for_event is called (for instance from another poll_for_response); + - event with sequence N is retrieved; + - the N sequence is widen, however, as the "new" number computed from + last_request_read is less than N the number is widened to N + 2^32 + (assuming last_request_read is still contained in 32 bit); + - poll_for_event enters the nested if statement as req is NULL; + - we compare the widen N (which now does not fit into 32 bit) with + request (which fits into 32 bit) hitting the throw_thread_fail_assert. + + To avoid the race condition and to avoid the sequence to go back + I check again for new events after getting the response and + return this last event if present saving the reply to return it + later. + + To test the race and the fix it's helpful to add a delay (I used a + "usleep(5000)") before calling xcb_poll_for_reply64. + + Original patch written by Frediano Ziglio, see + https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/34 + + Reworked primarily for readability by Peter Hutterer, see + https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/53 + + Signed-off-by: Peter Hutterer <peter.hutte...@who-t.net> + + bionic needs another commit so that the real fix applies. + + [Test case] + + It's a race condition, the SRU sponsor (tjaalton) does not have a test + case for this, but the bug subscribers seem to. + + + [Where things could go wrong] + + In theory there might be a case where a race still happens, but since + this has been upstream for a year now with no follow-up commits, it's + safe to assume that there are no regressions. + + + -- + STEPS TO REPRODUCE ================== The bug seems to occur when clicking on a file or folder. It is random and difficult to provide clear steps to reproduce. It is, however, a common situation. EXPECTED RESULTS ================ pcmanfm works without problem. ACTUAL RESULTS ============== All pcmanfm windows become unresponsive, though background processes (e.g. copying) may continue without problem. with the same error message in ~/.cache/lxsession/LXDE/run.log: [xcb] Unknown sequence number while processing queue [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called [xcb] Aborting, sorry about that. pcmanfm: xcb_io.c:259: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed. ** Message: 19:58:49.267: app.vala:130: pcmanfm exit with this type of exit: 6 ** Message: 19:58:49.268: app.vala:148: Exit not normal, try to reload (note the timestamp on the message will vary) AFFECTED VERSIONS ================= 1.2.5-3ubuntu1 NOT 1.2.4-1ubuntu0.1 UPSTREAM BUG ============ https://sourceforge.net/p/pcmanfm/bugs/1089/ ADDITIONAL NOTES ================ Other GTK2 file managers (e.g. Thunar) and applications (e.g. GIMP, Leafpad) seem to have the same problems. This is probably at least rooted in a GTK2 bug: https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bug/1808710 To further assert this, note that there is a SpaceFM file manager that is available in GTK2 and GTK3. The GTK2 version displays the behavior. The GTK3 version does not. Same with LibreOffice.
-- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libx11 in Ubuntu. https://bugs.launchpad.net/bugs/1782984 Title: Assertion `!xcb_xlib_threads_sequence_lost' failed with multiple applications Status in libx11 package in Ubuntu: Fix Released Status in libx11 source package in Bionic: In Progress Status in libx11 source package in Focal: In Progress Status in libx11 source package in Groovy: Won't Fix Bug description: [Impact] There is a race in libx11 causing applications to randomly abort. It's not trivial to reproduce, but there are enough duplicates that this deserves an SRU to bionic & focal. [Fix] Backport a commit from upstream: From dbb55e1a5e82870466b095097d9e46046680ec25 Mon Sep 17 00:00:00 2001 From: Frediano Ziglio <fzig...@redhat.com> Date: Wed, 29 Jan 2020 09:06:54 +0000 Subject: [PATCH] Fix poll_for_response race condition In poll_for_response is it possible that event replies are skipped and a more up to date message reply is returned. This will cause next poll_for_event call to fail aborting the program. This was proved using some slow ssh tunnel or using some program to slow down server replies (I used a combination of xtrace and strace). How the race happens: - program enters into poll_for_response; - poll_for_event is called but the server didn't still send the reply; - pending_requests is not NULL because we send a request (see call to append_pending_request in _XSend); - xcb_poll_for_reply64 is called from poll_for_response; - xcb_poll_for_reply64 will read from server, at this point server reply with an event (say sequence N) and the reply to our last request (say sequence N+1); - xcb_poll_for_reply64 returns the reply for the request we asked; - last_request_read is set to N+1 sequence in poll_for_response; - poll_for_response returns the response to the request; - poll_for_event is called (for instance from another poll_for_response); - event with sequence N is retrieved; - the N sequence is widen, however, as the "new" number computed from last_request_read is less than N the number is widened to N + 2^32 (assuming last_request_read is still contained in 32 bit); - poll_for_event enters the nested if statement as req is NULL; - we compare the widen N (which now does not fit into 32 bit) with request (which fits into 32 bit) hitting the throw_thread_fail_assert. To avoid the race condition and to avoid the sequence to go back I check again for new events after getting the response and return this last event if present saving the reply to return it later. To test the race and the fix it's helpful to add a delay (I used a "usleep(5000)") before calling xcb_poll_for_reply64. Original patch written by Frediano Ziglio, see https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/34 Reworked primarily for readability by Peter Hutterer, see https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/53 Signed-off-by: Peter Hutterer <peter.hutte...@who-t.net> bionic needs another commit so that the real fix applies. [Test case] It's a race condition, the SRU sponsor (tjaalton) does not have a test case for this, but the bug subscribers seem to. [Where things could go wrong] In theory there might be a case where a race still happens, but since this has been upstream for a year now with no follow-up commits, it's safe to assume that there are no regressions. -- STEPS TO REPRODUCE ================== The bug seems to occur when clicking on a file or folder. It is random and difficult to provide clear steps to reproduce. It is, however, a common situation. EXPECTED RESULTS ================ pcmanfm works without problem. ACTUAL RESULTS ============== All pcmanfm windows become unresponsive, though background processes (e.g. copying) may continue without problem. with the same error message in ~/.cache/lxsession/LXDE/run.log: [xcb] Unknown sequence number while processing queue [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called [xcb] Aborting, sorry about that. pcmanfm: xcb_io.c:259: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed. ** Message: 19:58:49.267: app.vala:130: pcmanfm exit with this type of exit: 6 ** Message: 19:58:49.268: app.vala:148: Exit not normal, try to reload (note the timestamp on the message will vary) AFFECTED VERSIONS ================= 1.2.5-3ubuntu1 NOT 1.2.4-1ubuntu0.1 UPSTREAM BUG ============ https://sourceforge.net/p/pcmanfm/bugs/1089/ ADDITIONAL NOTES ================ Other GTK2 file managers (e.g. Thunar) and applications (e.g. GIMP, Leafpad) seem to have the same problems. This is probably at least rooted in a GTK2 bug: https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bug/1808710 To further assert this, note that there is a SpaceFM file manager that is available in GTK2 and GTK3. The GTK2 version displays the behavior. The GTK3 version does not. Same with LibreOffice. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libx11/+bug/1782984/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp