Public bug reported: Release: Ubuntu 24.04 LTS (noble) Installed version: 46.2-0ubuntu1
== Bug reference == This is a request to backport upstream commit 44814b8 into the noble package. The crash is tracked in Launchpad bug #2156739 (filed against gnome-shell, which is where the memory accumulates, but the triggering defect is in xdg-desktop-portal-gnome). Upstream issue filed against xdg-desktop-portal-gnome: https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/work_items/218 (closed by maintainers as "version too old", directing fix to the distro) == [Impact] == On Ubuntu 24.04 with GNOME 46, a laptop used in clamshell mode (lid closed, external HDMI monitor only, Intel xe/Arrow Lake-U GPU) hangs and requires a hard reboot after 2–3 hours of idle. Nine confirmed hang events have occurred. Root cause chain: 1. Screen blank (DPMS off after 5-min idle) causes Mutter to emit MonitorsChanged on org.gnome.Mutter.DisplayConfig 2. xdg-desktop-portal-gnome's DisplayStateTracker responds with an async GetCurrentState call 3. In the async callback (get_current_state_cb), the code uses tracker->proxy rather than the proxy from the async result's source_object parameter. When tracker->proxy has become stale or mismatched by the time the callback fires (a race condition in the display state cycle), the call fails. 4. The failed call triggers an error log ("Monitor 'Built-in display' has no configuration which is-current!") which appears to re-schedule another GetCurrentState call — producing a retry loop that fires every ~8–9 seconds for the entire idle period. 5. Each iteration of this loop causes gnome-shell to allocate ~2 × 32 MB DMA-BUF framebuffer objects that are never released. 6. Leak rate: ~200 MB/min. After 2–3 hours idle: ~24 GB accumulated → hang. This was confirmed with dbus-monitor and a custom fdinfo logger tracking gnome-shell's drm-total-gtt and exported DMA-BUF fd count. The DMA-BUF count rose steadily (e.g. 22 → 41 fds over 3 minutes) during idle with MonitorsChanged firing, and stopped rising when the user returned. The leaked DMA-BUFs were never freed. Screen lock is NOT required — DPMS off alone triggers it (confirmed by setting lock-delay=3600 and observing locked=no throughout the leak period). == [Test Case] == 1. ThinkPad T16 Gen 4 (or similar) with Intel Arrow Lake-U / xe driver 2. Close laptop lid, connect only external HDMI monitor (clamshell mode) 3. Set idle-delay to 300 (5 minutes): gsettings set org.gnome.desktop.session idle-delay 300 4. Leave system completely idle for 5+ minutes (allow screen to blank) 5. Run this logger before step 4: while true; do pid=$(pgrep -x gnome-shell | head -1) fds=0 for f in /proc/$pid/fdinfo/*; do grep -q "^exp_name:.*drm" "$f" 2>/dev/null && fds=$((fds+1)) done echo "$(date '+%H:%M:%S') dmabuf_fds=$fds" sleep 15 done 6. After returning from idle, check whether dmabuf_fds grew steadily during the blank period. On an affected system it rises ~6/min. On a fixed system it stays flat. Affected: Ubuntu 24.04 noble, xdg-desktop-portal-gnome 46.2-0ubuntu1 Likely fixed: GNOME 47+ (Ubuntu 24.10+), based on inspection of current upstream source which no longer shows retry behavior in this path. == [Fix] == Upstream commit (merged into GNOME 47 development cycle, Sept 2024): 44814b8 "display-state-tracker: Use proxy from source object in callback" https://github.com/GNOME/xdg-desktop-portal-gnome/commit/44814b8 Diff summary (src/displaystatetracker.c, get_current_state_cb function): Before: if (!org_gnome_mutter_display_config_call_get_current_state_finish ( tracker->proxy, ...)) { g_warning ("Failed to get current display state: %s", error->message); return; } After: OrgGnomeMutterDisplayConfig *proxy = ORG_GNOME_MUTTER_DISPLAY_CONFIG (source_object); ... if (!org_gnome_mutter_display_config_call_get_current_state_finish ( proxy, ...)) { if (!g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED)) g_warning ("Failed to get current display state: %s", error->message); return; } The fix uses the proxy provided by the async framework (source_object) instead of the potentially stale tracker->proxy, eliminating the race condition that causes the callback to fail and re-schedule indefinitely. It also suppresses spurious warnings for legitimately cancelled operations. == [Regression Potential] == Low. The change is confined to a single async callback in displaystatetracker.c. It replaces a potentially stale object reference with the canonical one provided by the GLib async framework (source_object is always valid at callback time by GLib contract). The added G_IO_ERROR_CANCELLED check is purely cosmetic (reduces log noise). No functional change to the happy path. ** Affects: xdg-desktop-portal-gnome (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2156892 Title: [SRU] xdg-desktop-portal-gnome: async callback uses stale proxy, causing DMA-BUF leak and system hang in clamshell mode To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/xdg-desktop-portal-gnome/+bug/2156892/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
