Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
On 10/2/24 15:52, Guenter Roeck wrote: > On 10/2/24 03:26, Vlastimil Babka wrote: >> On 10/1/24 18:20, Vlastimil Babka wrote: >>> Guenter Roeck reports that the new slub kunit tests added by commit >>> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and >>> test_leak_destroy()") cause a lockup on boot on several architectures >>> when the kunit tests are configured to be built-in and not modules. >>> >>> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection >>> showed the runner for built-in kunit tests kunit_run_all_tests() is >>> called before setting system_state to SYSTEM_RUNNING and calling >>> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I >>> was unable to reproduce the problem myself, skipping the test when the >>> slub_kunit module is built-in should avoid the issue. >>> >>> An alternative fix that was moving the call to kunit_run_all_tests() a >>> bit later in the boot was tried, but has broken tests with functions >>> marked as __init due to free_initmem() already being done. >>> >>> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and >>> test_leak_destroy()") >>> Reported-by: Guenter Roeck >>> Closes: >>> https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/ >> >> I hope you can confirm it helps, because the commit added two tests and I've >> only skipped one of them, as it's the one using kfree_rcu(), which is >> suspected. But the other is responsible for the (now suppressed) >> kmem_cache_destroy() warning, and maybe I'm missing something and it was >> actually that one causing the lockups. >> > > Everything works with your patches applied, so we are good. Thanks for testing! Queued for -next now and will send to Linus later if all's good. >> Since you mentioned the boot lockups happened on some x86_64 too, do you >> have a .config of the lockup case? I've tried tweaking some rcu options but >> still nothing. >> > > I have a bunch of debug options enabled. Configuration (generated using > "make savedefconfig") for x86_64 is attached. Hmm, didn't see the hang with that (using virtme-ng) on v6.12-rc1. Guess there's something more to it. Oh well. > Thanks, > Guenter
Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
On 10/2/24 03:26, Vlastimil Babka wrote: On 10/1/24 18:20, Vlastimil Babka wrote: Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules. The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue. An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done. Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/ I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups. Everything works with your patches applied, so we are good. Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing. I have a bunch of debug options enabled. Configuration (generated using "make savedefconfig") for x86_64 is attached. Thanks, Guenter defconfig.gz Description: application/gzip
Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
On 10/1/24 09:20, Vlastimil Babka wrote: Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules. The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue. An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done. Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/ Cc: "Paul E. McKenney" Cc: Boqun Feng Cc: Uladzislau Rezki Cc: r...@vger.kernel.org Cc: Brendan Higgins Cc: David Gow Cc: Rae Moar Cc: linux-kselft...@vger.kernel.org Cc: kunit-...@googlegroups.com Signed-off-by: Vlastimil Babka This results in: KTAP version 1 # Subtest: slub_test # module: slub_kunit 1..8 # test_clobber_zone: pass:1 fail:0 skip:0 total:1 ok 1 test_clobber_zone # test_next_pointer: pass:1 fail:0 skip:0 total:1 ok 2 test_next_pointer # test_first_word: pass:1 fail:0 skip:0 total:1 ok 3 test_first_word # test_clobber_50th_byte: pass:1 fail:0 skip:0 total:1 ok 4 test_clobber_50th_byte # test_clobber_redzone_free: pass:1 fail:0 skip:0 total:1 ok 5 test_clobber_redzone_free # test_kmalloc_redzone_access: pass:1 fail:0 skip:0 total:1 ok 6 test_kmalloc_redzone_access # test_kfree_rcu: pass:0 fail:0 skip:1 total:1 ok 7 test_kfree_rcu # SKIP can't do kfree_rcu() when test is built-in # test_leak_destroy: pass:1 fail:0 skip:0 total:1 ok 8 test_leak_destroy # slub_test: pass:7 fail:0 skip:1 total:8 Tested-by: Guenter Roeck Thanks, Guenter
Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
On 10/1/24 18:20, Vlastimil Babka wrote: > Guenter Roeck reports that the new slub kunit tests added by commit > 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and > test_leak_destroy()") cause a lockup on boot on several architectures > when the kunit tests are configured to be built-in and not modules. > > The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection > showed the runner for built-in kunit tests kunit_run_all_tests() is > called before setting system_state to SYSTEM_RUNNING and calling > rcu_end_inkernel_boot(), so this seems like a likely cause. So while I > was unable to reproduce the problem myself, skipping the test when the > slub_kunit module is built-in should avoid the issue. > > An alternative fix that was moving the call to kunit_run_all_tests() a > bit later in the boot was tried, but has broken tests with functions > marked as __init due to free_initmem() already being done. > > Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and > test_leak_destroy()") > Reported-by: Guenter Roeck > Closes: > https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/ I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups. Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing. Thanks! > Cc: "Paul E. McKenney" > Cc: Boqun Feng > Cc: Uladzislau Rezki > Cc: r...@vger.kernel.org > Cc: Brendan Higgins > Cc: David Gow > Cc: Rae Moar > Cc: linux-kselft...@vger.kernel.org > Cc: kunit-...@googlegroups.com > Signed-off-by: Vlastimil Babka > --- > lib/slub_kunit.c | 14 ++ > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c > index > 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 > 100644 > --- a/lib/slub_kunit.c > +++ b/lib/slub_kunit.c > @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct { > > static void test_kfree_rcu(struct kunit *test) > { > - struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu", > - sizeof(struct test_kfree_rcu_struct), > - SLAB_NO_MERGE); > - struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL); > + struct kmem_cache *s; > + struct test_kfree_rcu_struct *p; > + > + if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST)) > + kunit_skip(test, "can't do kfree_rcu() when test is built-in"); > + > + s = test_kmem_cache_create("TestSlub_kfree_rcu", > +sizeof(struct test_kfree_rcu_struct), > +SLAB_NO_MERGE); > + p = kmem_cache_alloc(s, GFP_KERNEL); > > kfree_rcu(p, rcu); > kmem_cache_destroy(s); >