Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in

2024-10-02 Thread Vlastimil Babka
On 10/2/24 15:52, Guenter Roeck wrote:
> On 10/2/24 03:26, Vlastimil Babka wrote:
>> On 10/1/24 18:20, Vlastimil Babka wrote:
>>> Guenter Roeck reports that the new slub kunit tests added by commit
>>> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
>>> test_leak_destroy()") cause a lockup on boot on several architectures
>>> when the kunit tests are configured to be built-in and not modules.
>>>
>>> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
>>> showed the runner for built-in kunit tests kunit_run_all_tests() is
>>> called before setting system_state to SYSTEM_RUNNING and calling
>>> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
>>> was unable to reproduce the problem myself, skipping the test when the
>>> slub_kunit module is built-in should avoid the issue.
>>>
>>> An alternative fix that was moving the call to kunit_run_all_tests() a
>>> bit later in the boot was tried, but has broken tests with functions
>>> marked as __init due to free_initmem() already being done.
>>>
>>> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
>>> test_leak_destroy()")
>>> Reported-by: Guenter Roeck 
>>> Closes: 
>>> https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/
>> 
>> I hope you can confirm it helps, because the commit added two tests and I've
>> only skipped one of them, as it's the one using kfree_rcu(), which is
>> suspected. But the other is responsible for the (now suppressed)
>> kmem_cache_destroy() warning, and maybe I'm missing something and it was
>> actually that one causing the lockups.
>> 
> 
> Everything works with your patches applied, so we are good.

Thanks for testing! Queued for -next now and will send to Linus later if
all's good.

>> Since you mentioned the boot lockups happened on some x86_64 too, do you
>> have a .config of the lockup case? I've tried tweaking some rcu options but
>> still nothing.
>> 
> 
> I have a bunch of debug options enabled. Configuration (generated using
> "make savedefconfig") for x86_64 is attached.

Hmm, didn't see the hang with that (using virtme-ng) on v6.12-rc1. Guess
there's something more to it. Oh well.

> Thanks,
> Guenter




Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in

2024-10-02 Thread Guenter Roeck

On 10/2/24 03:26, Vlastimil Babka wrote:

On 10/1/24 18:20, Vlastimil Babka wrote:

Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.

The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
showed the runner for built-in kunit tests kunit_run_all_tests() is
called before setting system_state to SYSTEM_RUNNING and calling
rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
was unable to reproduce the problem myself, skipping the test when the
slub_kunit module is built-in should avoid the issue.

An alternative fix that was moving the call to kunit_run_all_tests() a
bit later in the boot was tried, but has broken tests with functions
marked as __init due to free_initmem() already being done.

Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
test_leak_destroy()")
Reported-by: Guenter Roeck 
Closes: 
https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/


I hope you can confirm it helps, because the commit added two tests and I've
only skipped one of them, as it's the one using kfree_rcu(), which is
suspected. But the other is responsible for the (now suppressed)
kmem_cache_destroy() warning, and maybe I'm missing something and it was
actually that one causing the lockups.



Everything works with your patches applied, so we are good.


Since you mentioned the boot lockups happened on some x86_64 too, do you
have a .config of the lockup case? I've tried tweaking some rcu options but
still nothing.



I have a bunch of debug options enabled. Configuration (generated using
"make savedefconfig") for x86_64 is attached.

Thanks,
Guenter


defconfig.gz
Description: application/gzip


Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in

2024-10-02 Thread Guenter Roeck

On 10/1/24 09:20, Vlastimil Babka wrote:

Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.

The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
showed the runner for built-in kunit tests kunit_run_all_tests() is
called before setting system_state to SYSTEM_RUNNING and calling
rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
was unable to reproduce the problem myself, skipping the test when the
slub_kunit module is built-in should avoid the issue.

An alternative fix that was moving the call to kunit_run_all_tests() a
bit later in the boot was tried, but has broken tests with functions
marked as __init due to free_initmem() already being done.

Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
test_leak_destroy()")
Reported-by: Guenter Roeck 
Closes: 
https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: Uladzislau Rezki 
Cc: r...@vger.kernel.org
Cc: Brendan Higgins 
Cc: David Gow 
Cc: Rae Moar 
Cc: linux-kselft...@vger.kernel.org
Cc: kunit-...@googlegroups.com
Signed-off-by: Vlastimil Babka 


This results in:

KTAP version 1
# Subtest: slub_test
# module: slub_kunit
1..8
# test_clobber_zone: pass:1 fail:0 skip:0 total:1
ok 1 test_clobber_zone
# test_next_pointer: pass:1 fail:0 skip:0 total:1
ok 2 test_next_pointer
# test_first_word: pass:1 fail:0 skip:0 total:1
ok 3 test_first_word
# test_clobber_50th_byte: pass:1 fail:0 skip:0 total:1
ok 4 test_clobber_50th_byte
# test_clobber_redzone_free: pass:1 fail:0 skip:0 total:1
ok 5 test_clobber_redzone_free
# test_kmalloc_redzone_access: pass:1 fail:0 skip:0 total:1
ok 6 test_kmalloc_redzone_access
# test_kfree_rcu: pass:0 fail:0 skip:1 total:1
ok 7 test_kfree_rcu # SKIP can't do kfree_rcu() when test is built-in
# test_leak_destroy: pass:1 fail:0 skip:0 total:1
ok 8 test_leak_destroy
# slub_test: pass:7 fail:0 skip:1 total:8

Tested-by: Guenter Roeck 

Thanks,
Guenter




Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in

2024-10-02 Thread Vlastimil Babka
On 10/1/24 18:20, Vlastimil Babka wrote:
> Guenter Roeck reports that the new slub kunit tests added by commit
> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
> test_leak_destroy()") cause a lockup on boot on several architectures
> when the kunit tests are configured to be built-in and not modules.
> 
> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
> showed the runner for built-in kunit tests kunit_run_all_tests() is
> called before setting system_state to SYSTEM_RUNNING and calling
> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
> was unable to reproduce the problem myself, skipping the test when the
> slub_kunit module is built-in should avoid the issue.
> 
> An alternative fix that was moving the call to kunit_run_all_tests() a
> bit later in the boot was tried, but has broken tests with functions
> marked as __init due to free_initmem() already being done.
> 
> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
> test_leak_destroy()")
> Reported-by: Guenter Roeck 
> Closes: 
> https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/

I hope you can confirm it helps, because the commit added two tests and I've
only skipped one of them, as it's the one using kfree_rcu(), which is
suspected. But the other is responsible for the (now suppressed)
kmem_cache_destroy() warning, and maybe I'm missing something and it was
actually that one causing the lockups.

Since you mentioned the boot lockups happened on some x86_64 too, do you
have a .config of the lockup case? I've tried tweaking some rcu options but
still nothing.

Thanks!

> Cc: "Paul E. McKenney" 
> Cc: Boqun Feng 
> Cc: Uladzislau Rezki 
> Cc: r...@vger.kernel.org
> Cc: Brendan Higgins 
> Cc: David Gow 
> Cc: Rae Moar 
> Cc: linux-kselft...@vger.kernel.org
> Cc: kunit-...@googlegroups.com
> Signed-off-by: Vlastimil Babka 
> ---
>  lib/slub_kunit.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
> index 
> 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054
>  100644
> --- a/lib/slub_kunit.c
> +++ b/lib/slub_kunit.c
> @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct {
>  
>  static void test_kfree_rcu(struct kunit *test)
>  {
> - struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
> - sizeof(struct test_kfree_rcu_struct),
> - SLAB_NO_MERGE);
> - struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL);
> + struct kmem_cache *s;
> + struct test_kfree_rcu_struct *p;
> +
> + if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
> + kunit_skip(test, "can't do kfree_rcu() when test is built-in");
> +
> + s = test_kmem_cache_create("TestSlub_kfree_rcu",
> +sizeof(struct test_kfree_rcu_struct),
> +SLAB_NO_MERGE);
> + p = kmem_cache_alloc(s, GFP_KERNEL);
>  
>   kfree_rcu(p, rcu);
>   kmem_cache_destroy(s);
>