** Summary changed:

- 6.8 backport for CVE-2025-21861 causes kernel hangs
+ Incorrect backport for CVE-2025-21861 causes kernel hangs

** Description changed:

- My team recently picked up linux-azure_6.8.0-1033.38 and linux-azure-
- nvidia_6.8.0-1021.2.  These kernels have failed our internal
- qualification because our nvidia health-checking scripts are getting
- stuck in a kernel call.  After debugging, the backport of the patch for
- "mm/migrate_device: don't add folio to be freed to LRU in
- migrate_device_finalize()" appears to have manual conflict resolution
- that introduced a bug.  I've reverted this patch in a test tree for
- linux-azure_6.8.0-1033.38 and validated that the problem resolves.  I've
- also come up with an alternate merge strategy that should have a much
- simpler conflict resolution, and that does not reproduce the problem
- after testing.
+ BugLink: https://bugs.launchpad.net/bugs/2120330
  
- First, the problem: our nvbandwidth tasks are getting stuck waiting for
- one of the nvidia drivers to release memory.  The nvidia kernel task
- that needs to complete in order for this memory release to proceed is
- blocked waiting in migration_entry_wait_on_locked.  One of ur parters at
- Nvidia also reproduced this problem and had page debug flags enabled.
- He observed a BUG for the presence of PG_active and LRU bits set in
- pageflags when the page was freed.
+ [Impact]
  
- stacks from the two participating processes:
+ The patch for CVE-2025-21861 was incorrectly backported to the noble 6.8 
+ kernel, leading to hangs when freeing device memory.
  
- ID: 871438   TASK: ffff007d4d668200  CPU: 95   COMMAND: "nvbandwidth"
+ commit 41cddf83d8b00f29fd105e7a0777366edc69a5cf
+ Author: David Hildenbrand <[email protected]>
+ Date:   Mon Feb 10 17:13:17 2025 +0100
+ Subject: mm/migrate_device: don't add folio to be freed to LRU in 
migrate_device_finalize()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=41cddf83d8b00f29fd105e7a0777366edc69a5cf
+ ubuntu-noble: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/commit/?id=3858edb1146374f3240d1ec769ba857186531b17
+ 
+ An incorrect backport was performed, causing the old page to be placed
+ back instead of the new page, e.g.:
+ 
+                 src = page_folio(page);
+                 dst = page_folio(newpage);
+ +               if (!is_zone_device_page(page))
+ +                       putback_lru_page(page);
+ 
+ when in 41cddf83d8b00f29fd105e7a0777366edc69a5cf we have:
+ 
+ +               if (!folio_is_zone_device(dst))
+ +                       folio_add_lru(dst);
+ 
+ in which case, we should really have had the backport as:
+ 
+ +               if (!folio_is_zone_device(newpage))
+ +                       folio_add_lru(newpage);
+ 
+ This keeps references alive to the old memory pages, preventing them from 
being
+ released and freed.
+ 
+ Stack traces of stuck processes:
+ 
+ ID: 871438 TASK: ffff007d4d668200 CPU: 95 COMMAND: "nvbandwidth"
   #0 [ffff80010e8ef840] __switch_to at ffffc0f22798c550
   #1 [ffff80010e8ef8a0] __schedule at ffffc0f22798c89c
   #2 [ffff80010e8ef900] schedule at ffffc0f22798cd40
   #3 [ffff80010e8ef930] schedule_preempt_disabled at ffffc0f22798d388
   #4 [ffff80010e8ef9c0] rwsem_down_write_slowpath at ffffc0f227990dc8
   #5 [ffff80010e8efa20] down_write at ffffc0f2279912d0
   #6 [ffff80010e8efaa0] uvm_va_space_mm_shutdown at ffffc0f1c2a451ec 
[nvidia_uvm]
   #7 [ffff80010e8efb00] uvm_va_space_mm_unregister at ffffc0f1c2a457a0 
[nvidia_uvm]
   #8 [ffff80010e8efb30] uvm_release at ffffc0f1c2a226d4 [nvidia_uvm]
   #9 [ffff80010e8efc00] uvm_release_entry.part.0 at ffffc0f1c2a227dc 
[nvidia_uvm]
  #10 [ffff80010e8efc20] uvm_release_entry at ffffc0f1c2a22850 [nvidia_uvm]
  #11 [ffff80010e8efc30] __fput at ffffc0f2269a5760
  #12 [ffff80010e8efc70] ____fput at ffffc0f2269a5a80
  #13 [ffff80010e8efc80] task_work_run at ffffc0f2265ceedc
  #14 [ffff80010e8efcc0] do_exit at ffffc0f2265a0bc8
  #15 [ffff80010e8efcf0] do_group_exit at ffffc0f2265a0fec
  #16 [ffff80010e8efd50] get_signal at ffffc0f2265b8750
  #17 [ffff80010e8efe10] do_signal at ffffc0f22650166c
  #18 [ffff80010e8efe40] do_notify_resume at ffffc0f2265018f0
  #19 [ffff80010e8efe70] el0_interrupt at ffffc0f227985564
  #20 [ffff80010e8efe90] __el0_irq_handler_common at ffffc0f2279855f0
  #21 [ffff80010e8efea0] el0t_64_irq_handler at ffffc0f227986080
  #22 [ffff80010e8effe0] el0t_64_irq at ffffc0f2264f17fc
  
- PID: 871467   TASK: ffff007f6aa66000  CPU: 66   COMMAND: "UVM GPU4 BH"
+ PID: 871467 TASK: ffff007f6aa66000 CPU: 66 COMMAND: "UVM GPU4 BH"
   #0 [ffff80015ddef580] __switch_to at ffffc0f22798c550
   #1 [ffff80015ddef5e0] __schedule at ffffc0f22798c89c
   #2 [ffff80015ddef640] schedule at ffffc0f22798cd40
   #3 [ffff80015ddef670] io_schedule at ffffc0f22798cec4
   #4 [ffff80015ddef6e0] migration_entry_wait_on_locked at ffffc0f22686e3f0
   #5 [ffff80015ddef740] migration_entry_wait at ffffc0f22695a6d4
   #6 [ffff80015ddef750] do_swap_page at ffffc0f2268d6378
   #7 [ffff80015ddef7d0] handle_pte_fault at ffffc0f2268da688
   #8 [ffff80015ddef870] __handle_mm_fault at ffffc0f2268da7f8
   #9 [ffff80015ddef8b0] handle_mm_fault at ffffc0f2268dab48
  #10 [ffff80015ddef910] handle_fault at ffffc0f1c2aace18 [nvidia_uvm]
  #11 [ffff80015ddef950] uvm_populate_pageable_vma at ffffc0f1c2aacf24 
[nvidia_uvm]
  #12 [ffff80015ddef990] migrate_pageable_vma_populate_mask at ffffc0f1c2aad8c0 
[nvidia_uvm]
  #13 [ffff80015ddefab0] uvm_migrate_pageable at ffffc0f1c2ab0294 [nvidia_uvm]
  #14 [ffff80015ddefb90] service_ats_requests at ffffc0f1c2abf828 [nvidia_uvm]
  #15 [ffff80015ddefbb0] uvm_ats_service_faults at ffffc0f1c2ac02f0 [nvidia_uvm]
  #16 [ffff80015ddefd40] uvm_parent_gpu_service_non_replayable_fault_buffer at 
ffffc0f1c2a82e00 [nvidia_uvm]
  #17 [ffff80015ddefda0] non_replayable_faults_isr_bottom_half at 
ffffc0f1c2a3c3e4 [nvidia_uvm]
  #18 [ffff80015ddefe00] non_replayable_faults_isr_bottom_half_entry at 
ffffc0f1c2a3c590 [nvidia_uvm]
  #19 [ffff80015ddefe20] _main_loop at ffffc0f1c2a207c8 [nvidia_uvm]
  #20 [ffff80015ddefe70] kthread at ffffc0f2265d40dc
  
- For this one, I was able to find the wait_page_queue in the stack and
- get the folio from there:
+ There is no workaround.
  
- struct wait_page_queue {
-   folio = 0xffffffc0205cec80,
-   bit_nr = 0,
-   wait = {
-     flags = 0,
-     private = 0xffff007f6aa66000,
-     func = 0xffffc0f226867a30 <wake_page_function>,
-     entry = {
-       next = 0xffffc0f2297d2ae8 <folio_wait_table+3944>,
-       prev = 0xffffc0f2297d2ae8 <folio_wait_table+3944>
-     }
-   }
- }
+ [Fix]
  
- Folio page has flags: 396316561050206252 which means neither `PG_locked`
- nor `PG_waiters` is set.
+ To make things less confusing, revert the incorrect backport, and backport
+ "mm: migrate_device: use more folio in migrate_device_finalize()" to use the
+ new upstream notations, and correctly backport "mm/migrate_device: don't add
+ folio to be freed to LRU in migrate_device_finalize()". This approach was
+ suggested and tested by Krister Johansen, and I think it is reasonable.
  
- Looking at Ubuntu's 6.8 backport of "mm/migrate_device: don't add folio
- to be freed to LRU in migrate_device_finalize()", the
- migrate_device_finalize code does this:
+ commit 58bf8c2bf47550bc94fea9cafd2bc7304d97102c
+ Author: Kefeng Wang <[email protected]>
+ Date:   Mon Aug 26 14:58:12 2024 +0800
+ Subject: mm: migrate_device: use more folio in migrate_device_finalize()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=58bf8c2bf47550bc94fea9cafd2bc7304d97102c
  
- +               if (!is_zone_device_page(page))
- +                       putback_lru_page(page);
+ commit 41cddf83d8b00f29fd105e7a0777366edc69a5cf
+ Author: David Hildenbrand <[email protected]>
+ Date:   Mon Feb 10 17:13:17 2025 +0100
+ Subject: mm/migrate_device: don't add folio to be freed to LRU in 
migrate_device_finalize()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=41cddf83d8b00f29fd105e7a0777366edc69a5cf
  
- but upstream it does this instead:
+ The first patch landed in 6.12-rc1 and the second patch in 6.14-rc4. Both are
+ in plucky.
  
- +               if (!folio_is_zone_device(dst))
- +                       folio_add_lru(dst);
+ [Testcase]
  
- I think in some cases, this is actually putting back the old page when
- it shouldn't.  We want the new page instead, generally.  I get a cleaner
- conflict resolution if I apply the following:
+ There are a few ways to trigger the issue.
  
-   58bf8c2bf475 mm: migrate_device: use more folio in migrate_device_finalize()
-   41cddf83d8b0 mm/migrate_device: don't add folio to be freed to LRU in 
migrate_device_finalize()
+ You can run the hmm selftests.
  
- The second patch doesn't merge without conflicts, but has only a trivial
- conflict against:
+ 1) Check out a kernel git tree
+ 2) cd tools/testing/selftests/mm/
+ 3) make
+ 4) sudo ./hmm_tests
  
-   b1f202060afe mm: remap unused subpages to shared zeropage when
- splitting isolated thp
+ You can also run nvidia tests like nvbandwidth, if your system has a Nvidia 
GPU:
+ https://github.com/NVIDIA/nvbandwidth
  
+ $ git clone https://github.com/NVIDIA/nvbandwidth.git
+ $ cd nvbandwidth
+ $ sudo ./debian_install.sh
+ $ sudo ./nvbandwidth
  
- -            remove_migration_ptes(src, dst, false);
- +               remove_migration_ptes(src, dst, 0);
+ A test package is available in the following ppa:
  
- Then the only resolution is re-writing the 0 -> false in the second
- patch.
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf416039-test
  
- I've been running with this modification and have not seen any hangs on
- a test that used to hang basically immediately.
+ If you install it, and run the hmm selftests, it should no longer hang.
  
- Do you think Ubuntu would be able to fix this up before any of the
- kernels with this CVE graduate from proposed?
+ [Where problems can occur]
+ 
+ This changes some core mm code for device memory from standard pages to using
+ folios, and carries some additional risk because of this.
+ 
+ If a regression were to occur, it would primarily affect users of devices with
+ internal memory, such as graphics cards, and quite possibly high end network
+ cards.
+ 
+ The largest userbase affected by this regression is nvidia users, so it really
+ would be a bad idea to release with the broken implementation, and instead, to
+ respin and release with the fixed implementation.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2120330

Title:
  Incorrect backport for CVE-2025-21861 causes kernel hangs

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2120330/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to