Re: [PATCH slab hotfixes 2/2] kunit: move call to kunit_run_all_tests() after rcu_end_inkernel_boot()

2024-10-02 Thread David Gow
On Tue, 1 Oct 2024 at 07:55, Guenter Roeck  wrote:
>
> On 9/30/24 11:50, Guenter Roeck wrote:
> > On 9/30/24 01:37, Vlastimil Babka wrote:
> >> Guenter Roeck reports that the new slub kunit tests added by commit
> >> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
> >> test_leak_destroy()") cause a lockup on boot on several architectures
> >> when the kunit tests are configured to be built-in and not modules.
> >>
> >> These tests invoke kfree_rcu() and kvfree_rcu_barrier() and boot
> >> sequence inspection showed the runner for built-in kunit tests
> >> kunit_run_all_tests() is called before setting system_state to
> >> SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a
> >> likely cause. So while I was unable to reproduce the problem myself,
> >> moving the call to kunit_run_all_tests() a bit later in the boot seems
> >> to have fixed the lockup problem according to Guenter's limited testing.
> >>
> >> No kunit tests should be broken by calling the built-in executor a bit
> >> later, as when compiled as modules, they are still executed even later
> >> than this.
> >>
>
> Actually, that is wrong.
>
> Turns out kunit_iov_iter (and other kunit tests) are marked __init.
> That means those unit tests have to run before the init code is released,
> and it actually _is_ harmful to run the tests after rcu_end_inkernel_boot()
> because at that time free_initmem() has already been called.

Yeah: some tests are marked __init. KUnit does actually mark these
with an attribute, so we can potentially split the execution up into
an 'init' part which runs early, and a later part, but there are some
complications if we still want to track the total number of tests and
support filtering, etc. properly.

That's something I think we'll look at for 6.13: in the meantime,
skipping the problematic slub tests when built-in seems to be the
right short-term fix. I'll look into having the built-in executor
moved later for non-init tests once we've worked out how best to adapt
the filter/KTAP output code to do so as cleanly as possible.

Cheers,
-- David


>
> Guenter
>
> >> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
> >> test_leak_destroy()")
> >> Reported-by: Guenter Roeck 
> >> Closes: 
> >> https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/
> >> Cc: "Paul E. McKenney" 
> >> Cc: Boqun Feng 
> >> Cc: Uladzislau Rezki 
> >> Cc: r...@vger.kernel.org
> >> Cc: Brendan Higgins 
> >> Cc: David Gow 
> >> Cc: Rae Moar 
> >> Cc: linux-kselft...@vger.kernel.org
> >> Cc: kunit-...@googlegroups.com
> >> Signed-off-by: Vlastimil Babka 
> >> ---
> >>   init/main.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/init/main.c b/init/main.c
> >> index 
> >> c4778edae7972f512d5eefe8400075ac35a70d1c..7890ebb00e84b8bd7bac28923fb1fe571b3e9ee2
> >>  100644
> >> --- a/init/main.c
> >> +++ b/init/main.c
> >> @@ -1489,6 +1489,8 @@ static int __ref kernel_init(void *unused)
> >>   rcu_end_inkernel_boot();
> >> +kunit_run_all_tests();
> >> +
> >>   do_sysctl_args();
> >>   if (ramdisk_execute_command) {
> >> @@ -1579,8 +1581,6 @@ static noinline void __init 
> >> kernel_init_freeable(void)
> >>   do_basic_setup();
> >> -kunit_run_all_tests();
> >> -
> >>   wait_for_initramfs();
> >>   console_on_rootfs();
> >>
> > Unfortunately it doesn't work. With this patch applied, I get many 
> > backtraces
> > similar to the following, and ultimately the image crashes. This is with 
> > arm64.
> > I do not see the problem if I drop this patch.
> >
> > Guenter
> >
> > ---
> > [9.465871] KTAP version 1
> > [9.465964] # Subtest: iov_iter
> > [9.466056] # module: kunit_iov_iter
> > [9.466115] 1..12
> > [9.467000] Unable to handle kernel paging request at virtual address 
> > c37db5c9f26c
> > [9.467244] Mem abort info:
> > [9.467332]   ESR = 0x8607
> > [9.467454]   EC = 0x21: IABT (current EL), IL = 32 bits
> > [9.467576]   SET = 0, FnV = 0
> > [9.467667]   EA = 0, S1PTW = 0
> > [9.467762]   FSC = 0x07: level 3 translation fault
> > [9.467912] swapper pgtable: 4k pages, 48-bit VAs, pgdp=42a59000
> > [9.468055] [c37db5c9f26c] pgd=, 
> > p4d=100044b36003, pud=100044b37003, pmd=100044b3a003, 
> > pte=
> > [9.469430] Internal error: Oops: 8607 [#1] PREEMPT SMP
> > [9.469687] Modules linked in:
> > [9.470035] CPU: 0 UID: 0 PID: 550 Comm: kunit_try_catch Tainted: G  
> >N 6.12.0-rc1-5-ga65e3eb58cdb #1
> > [9.470290] Tainted: [N]=TEST
> > [9.470356] Hardware name: linux,dummy-virt (DT)
> > [9.470530] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS 
> > BTYPE=--)
> > [9.470656] pc : iov_kunit_copy_to_kvec+0x0/0x334
> > [9.471055] lr : kunit_try_run_case+0x6c/0x15c
> > [9.471145] sp : 800080883de0
> > [9.471210] x29: 80

Re: [PATCH slab hotfixes 2/2] kunit: move call to kunit_run_all_tests() after rcu_end_inkernel_boot()

2024-10-01 Thread Vlastimil Babka
On 10/1/24 1:55 AM, Guenter Roeck wrote:
> On 9/30/24 11:50, Guenter Roeck wrote:
>> On 9/30/24 01:37, Vlastimil Babka wrote:
>>> Guenter Roeck reports that the new slub kunit tests added by commit
>>> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
>>> test_leak_destroy()") cause a lockup on boot on several architectures
>>> when the kunit tests are configured to be built-in and not modules.
>>>
>>> These tests invoke kfree_rcu() and kvfree_rcu_barrier() and boot
>>> sequence inspection showed the runner for built-in kunit tests
>>> kunit_run_all_tests() is called before setting system_state to
>>> SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a
>>> likely cause. So while I was unable to reproduce the problem myself,
>>> moving the call to kunit_run_all_tests() a bit later in the boot seems
>>> to have fixed the lockup problem according to Guenter's limited testing.
>>>
>>> No kunit tests should be broken by calling the built-in executor a bit
>>> later, as when compiled as modules, they are still executed even later
>>> than this.
>>>
> 
> Actually, that is wrong.
> 
> Turns out kunit_iov_iter (and other kunit tests) are marked __init.
> That means those unit tests have to run before the init code is released,
> and it actually _is_ harmful to run the tests after rcu_end_inkernel_boot()
> because at that time free_initmem() has already been called.

Oh, guess that explains why the kunit_run_all_tests() executor is called
so suspiciously early. Of course when built as modules, __init has a
different lifetime.

Guess I will just skip the two new tests using kfree_rcu() when the slub
kunit is built-in then. Thanks for testing.

> Guenter
> 
>>> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
>>> test_leak_destroy()")
>>> Reported-by: Guenter Roeck 
>>> Closes:
>>> https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/
>>> Cc: "Paul E. McKenney" 
>>> Cc: Boqun Feng 
>>> Cc: Uladzislau Rezki 
>>> Cc: r...@vger.kernel.org
>>> Cc: Brendan Higgins 
>>> Cc: David Gow 
>>> Cc: Rae Moar 
>>> Cc: linux-kselft...@vger.kernel.org
>>> Cc: kunit-...@googlegroups.com
>>> Signed-off-by: Vlastimil Babka 
>>> ---
>>>   init/main.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/init/main.c b/init/main.c
>>> index
>>> c4778edae7972f512d5eefe8400075ac35a70d1c..7890ebb00e84b8bd7bac28923fb1fe571b3e9ee2
>>>  100644
>>> --- a/init/main.c
>>> +++ b/init/main.c
>>> @@ -1489,6 +1489,8 @@ static int __ref kernel_init(void *unused)
>>>   rcu_end_inkernel_boot();
>>> +    kunit_run_all_tests();
>>> +
>>>   do_sysctl_args();
>>>   if (ramdisk_execute_command) {
>>> @@ -1579,8 +1581,6 @@ static noinline void __init
>>> kernel_init_freeable(void)
>>>   do_basic_setup();
>>> -    kunit_run_all_tests();
>>> -
>>>   wait_for_initramfs();
>>>   console_on_rootfs();
>>>
>> Unfortunately it doesn't work. With this patch applied, I get many
>> backtraces
>> similar to the following, and ultimately the image crashes. This is
>> with arm64.
>> I do not see the problem if I drop this patch.
>>
>> Guenter
>>
>> ---
>> [    9.465871] KTAP version 1
>> [    9.465964] # Subtest: iov_iter
>> [    9.466056] # module: kunit_iov_iter
>> [    9.466115] 1..12
>> [    9.467000] Unable to handle kernel paging request at virtual
>> address c37db5c9f26c
>> [    9.467244] Mem abort info:
>> [    9.467332]   ESR = 0x8607
>> [    9.467454]   EC = 0x21: IABT (current EL), IL = 32 bits
>> [    9.467576]   SET = 0, FnV = 0
>> [    9.467667]   EA = 0, S1PTW = 0
>> [    9.467762]   FSC = 0x07: level 3 translation fault
>> [    9.467912] swapper pgtable: 4k pages, 48-bit VAs,
>> pgdp=42a59000
>> [    9.468055] [c37db5c9f26c] pgd=,
>> p4d=100044b36003, pud=100044b37003, pmd=100044b3a003,
>> pte=
>> [    9.469430] Internal error: Oops: 8607 [#1] PREEMPT SMP
>> [    9.469687] Modules linked in:
>> [    9.470035] CPU: 0 UID: 0 PID: 550 Comm: kunit_try_catch Tainted:
>> G N 6.12.0-rc1-5-ga65e3eb58cdb #1
>> [    9.470290] Tainted: [N]=TEST
>> [    9.470356] Hardware name: linux,dummy-virt (DT)
>> [    9.470530] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [    9.470656] pc : iov_kunit_copy_to_kvec+0x0/0x334
>> [    9.471055] lr : kunit_try_run_case+0x6c/0x15c
>> [    9.471145] sp : 800080883de0
>> [    9.471210] x29: 800080883e20 x28:  x27:
>> 
>> [    9.471376] x26:  x25:  x24:
>> 80008000bb68
>> [    9.471501] x23: c37db3f7093c x22: 80008000b940 x21:
>> 545847af4c00
>> [    9.471622] x20: 545847cd3940 x19: 80008000bb50 x18:
>> 0006
>> [    9.471742] x17: 6c61746f7420303a x16: 70696b7320303a6c x15:
>> 0172
>> [    9.471863] x14: 0002 x13:  x12:
>> c37db6a600c8

Re: [PATCH slab hotfixes 2/2] kunit: move call to kunit_run_all_tests() after rcu_end_inkernel_boot()

2024-09-30 Thread Guenter Roeck

On 9/30/24 11:50, Guenter Roeck wrote:

On 9/30/24 01:37, Vlastimil Babka wrote:

Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.

These tests invoke kfree_rcu() and kvfree_rcu_barrier() and boot
sequence inspection showed the runner for built-in kunit tests
kunit_run_all_tests() is called before setting system_state to
SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a
likely cause. So while I was unable to reproduce the problem myself,
moving the call to kunit_run_all_tests() a bit later in the boot seems
to have fixed the lockup problem according to Guenter's limited testing.

No kunit tests should be broken by calling the built-in executor a bit
later, as when compiled as modules, they are still executed even later
than this.



Actually, that is wrong.

Turns out kunit_iov_iter (and other kunit tests) are marked __init.
That means those unit tests have to run before the init code is released,
and it actually _is_ harmful to run the tests after rcu_end_inkernel_boot()
because at that time free_initmem() has already been called.

Guenter


Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
test_leak_destroy()")
Reported-by: Guenter Roeck 
Closes: 
https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: Uladzislau Rezki 
Cc: r...@vger.kernel.org
Cc: Brendan Higgins 
Cc: David Gow 
Cc: Rae Moar 
Cc: linux-kselft...@vger.kernel.org
Cc: kunit-...@googlegroups.com
Signed-off-by: Vlastimil Babka 
---
  init/main.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/main.c b/init/main.c
index 
c4778edae7972f512d5eefe8400075ac35a70d1c..7890ebb00e84b8bd7bac28923fb1fe571b3e9ee2
 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1489,6 +1489,8 @@ static int __ref kernel_init(void *unused)
  rcu_end_inkernel_boot();
+    kunit_run_all_tests();
+
  do_sysctl_args();
  if (ramdisk_execute_command) {
@@ -1579,8 +1581,6 @@ static noinline void __init kernel_init_freeable(void)
  do_basic_setup();
-    kunit_run_all_tests();
-
  wait_for_initramfs();
  console_on_rootfs();


Unfortunately it doesn't work. With this patch applied, I get many backtraces
similar to the following, and ultimately the image crashes. This is with arm64.
I do not see the problem if I drop this patch.

Guenter

---
[    9.465871] KTAP version 1
[    9.465964] # Subtest: iov_iter
[    9.466056] # module: kunit_iov_iter
[    9.466115] 1..12
[    9.467000] Unable to handle kernel paging request at virtual address 
c37db5c9f26c
[    9.467244] Mem abort info:
[    9.467332]   ESR = 0x8607
[    9.467454]   EC = 0x21: IABT (current EL), IL = 32 bits
[    9.467576]   SET = 0, FnV = 0
[    9.467667]   EA = 0, S1PTW = 0
[    9.467762]   FSC = 0x07: level 3 translation fault
[    9.467912] swapper pgtable: 4k pages, 48-bit VAs, pgdp=42a59000
[    9.468055] [c37db5c9f26c] pgd=, p4d=100044b36003, 
pud=100044b37003, pmd=100044b3a003, pte=
[    9.469430] Internal error: Oops: 8607 [#1] PREEMPT SMP
[    9.469687] Modules linked in:
[    9.470035] CPU: 0 UID: 0 PID: 550 Comm: kunit_try_catch Tainted: G  
   N 6.12.0-rc1-5-ga65e3eb58cdb #1
[    9.470290] Tainted: [N]=TEST
[    9.470356] Hardware name: linux,dummy-virt (DT)
[    9.470530] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    9.470656] pc : iov_kunit_copy_to_kvec+0x0/0x334
[    9.471055] lr : kunit_try_run_case+0x6c/0x15c
[    9.471145] sp : 800080883de0
[    9.471210] x29: 800080883e20 x28:  x27: 
[    9.471376] x26:  x25:  x24: 80008000bb68
[    9.471501] x23: c37db3f7093c x22: 80008000b940 x21: 545847af4c00
[    9.471622] x20: 545847cd3940 x19: 80008000bb50 x18: 0006
[    9.471742] x17: 6c61746f7420303a x16: 70696b7320303a6c x15: 0172
[    9.471863] x14: 0002 x13:  x12: c37db6a600c8
[    9.471983] x11: 0043 x10: 0043 x9 : 1fff
[    9.472122] x8 :  x7 : 1040d4fd x6 : c37db70c3810
[    9.472243] x5 :  x4 : c4653600 x3 : 3b9ac9ff
[    9.472363] x2 : 0001 x1 : c37db5c9f26c x0 : 80008000bb50
[    9.472572] Call trace:
[    9.472636]  iov_kunit_copy_to_kvec+0x0/0x334
[    9.472740]  kunit_generic_run_threadfn_adapter+0x28/0x4c
[    9.472835]  kthread+0x11c/0x120
[    9.472903]  ret_from_fork+0x10/0x20
[    9.473146] Code:     ()
[    9.473505] ---[ end trace  ]---






Re: [PATCH slab hotfixes 2/2] kunit: move call to kunit_run_all_tests() after rcu_end_inkernel_boot()

2024-09-30 Thread Guenter Roeck

On 9/30/24 01:37, Vlastimil Babka wrote:

Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.

These tests invoke kfree_rcu() and kvfree_rcu_barrier() and boot
sequence inspection showed the runner for built-in kunit tests
kunit_run_all_tests() is called before setting system_state to
SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a
likely cause. So while I was unable to reproduce the problem myself,
moving the call to kunit_run_all_tests() a bit later in the boot seems
to have fixed the lockup problem according to Guenter's limited testing.

No kunit tests should be broken by calling the built-in executor a bit
later, as when compiled as modules, they are still executed even later
than this.

Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and 
test_leak_destroy()")
Reported-by: Guenter Roeck 
Closes: 
https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9...@roeck-us.net/
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: Uladzislau Rezki 
Cc: r...@vger.kernel.org
Cc: Brendan Higgins 
Cc: David Gow 
Cc: Rae Moar 
Cc: linux-kselft...@vger.kernel.org
Cc: kunit-...@googlegroups.com
Signed-off-by: Vlastimil Babka 
---
  init/main.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/main.c b/init/main.c
index 
c4778edae7972f512d5eefe8400075ac35a70d1c..7890ebb00e84b8bd7bac28923fb1fe571b3e9ee2
 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1489,6 +1489,8 @@ static int __ref kernel_init(void *unused)
  
  	rcu_end_inkernel_boot();
  
+	kunit_run_all_tests();

+
do_sysctl_args();
  
  	if (ramdisk_execute_command) {

@@ -1579,8 +1581,6 @@ static noinline void __init kernel_init_freeable(void)
  
  	do_basic_setup();
  
-	kunit_run_all_tests();

-
wait_for_initramfs();
console_on_rootfs();
  


Unfortunately it doesn't work. With this patch applied, I get many backtraces
similar to the following, and ultimately the image crashes. This is with arm64.
I do not see the problem if I drop this patch.

Guenter

---
[9.465871] KTAP version 1
[9.465964] # Subtest: iov_iter
[9.466056] # module: kunit_iov_iter
[9.466115] 1..12
[9.467000] Unable to handle kernel paging request at virtual address 
c37db5c9f26c
[9.467244] Mem abort info:
[9.467332]   ESR = 0x8607
[9.467454]   EC = 0x21: IABT (current EL), IL = 32 bits
[9.467576]   SET = 0, FnV = 0
[9.467667]   EA = 0, S1PTW = 0
[9.467762]   FSC = 0x07: level 3 translation fault
[9.467912] swapper pgtable: 4k pages, 48-bit VAs, pgdp=42a59000
[9.468055] [c37db5c9f26c] pgd=, p4d=100044b36003, 
pud=100044b37003, pmd=100044b3a003, pte=
[9.469430] Internal error: Oops: 8607 [#1] PREEMPT SMP
[9.469687] Modules linked in:
[9.470035] CPU: 0 UID: 0 PID: 550 Comm: kunit_try_catch Tainted: G  
   N 6.12.0-rc1-5-ga65e3eb58cdb #1
[9.470290] Tainted: [N]=TEST
[9.470356] Hardware name: linux,dummy-virt (DT)
[9.470530] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[9.470656] pc : iov_kunit_copy_to_kvec+0x0/0x334
[9.471055] lr : kunit_try_run_case+0x6c/0x15c
[9.471145] sp : 800080883de0
[9.471210] x29: 800080883e20 x28:  x27: 
[9.471376] x26:  x25:  x24: 80008000bb68
[9.471501] x23: c37db3f7093c x22: 80008000b940 x21: 545847af4c00
[9.471622] x20: 545847cd3940 x19: 80008000bb50 x18: 0006
[9.471742] x17: 6c61746f7420303a x16: 70696b7320303a6c x15: 0172
[9.471863] x14: 0002 x13:  x12: c37db6a600c8
[9.471983] x11: 0043 x10: 0043 x9 : 1fff
[9.472122] x8 :  x7 : 1040d4fd x6 : c37db70c3810
[9.472243] x5 :  x4 : c4653600 x3 : 3b9ac9ff
[9.472363] x2 : 0001 x1 : c37db5c9f26c x0 : 80008000bb50
[9.472572] Call trace:
[9.472636]  iov_kunit_copy_to_kvec+0x0/0x334
[9.472740]  kunit_generic_run_threadfn_adapter+0x28/0x4c
[9.472835]  kthread+0x11c/0x120
[9.472903]  ret_from_fork+0x10/0x20
[9.473146] Code:     ()
[9.473505] ---[ end trace  ]---