Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Heinrich Schuchardt Tue, 01 Sep 2020 03:53:22 -0700

On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:

On 11.08.20 12:32, Sean Anderson wrote:

On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:

On 11.08.20 08:20, Rick Chen wrote:

Hi Heinrich

From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
Sent: Tuesday, August 11, 2020 11:57 AM
To: Rick Jian-Zhi Chen(陳建志)
Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; 
u-boot@lists.denx.de; Heinrich Schuchardt
Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

At least on the Kendryte K210:

gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
gd->arch.ipi[1].arg1= 0x0000000000000000

We should not jump to 0x0 to handle an interrupt.


Can you explain why K210 be affected by it ?


I have been running U-Boot on the MaixDuino.

Without this patch secondary_hart_loop() is reached only once. With the
patch it is reached several thousand times.


Hm, interesting. To me, this is a symptom of something else going
terribly wrong. I originally had this check in place so that it would
be easier to detect these sorts of errors. I don't think this is the
correct fix, however. We should really try and find the root cause of
the bug.

I would not expect NULL to contain any code that should be executed by
the secondary hart. See doc/board/sipeed/maix.rst:

Address    Size      Description
========== ========= ===========
0x00000000 0x1000    debug
0x00001000 0x1000    rom
0x02000000 0xC000    clint


Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de>
---
  arch/riscv/lib/smp.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
ac22136314..d725fa32e8 100644
--- a/arch/riscv/lib/smp.c
+++ b/arch/riscv/lib/smp.c
@@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
                 return;
         }

+       if (!smp_function)
+               return;
         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  
}


I remember Sean add this check in
[v10,14/21] riscv: Clean up IPI initialization code
. And I ask him to remove.
https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/


Your comment was: "Please remove the sanity check. If zero address is
the intended jump point, then system will work abnormal."

The only place where gd->arch.ipi[hart].addr is set is:

arch/riscv/lib/smp.c:53 send_ipi_many():
gd->arch.ipi[reg].addr = ipi->addr;

send_ipi_many() is only called in smp_call_function().

So the line

smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);

can only work if smp_function() has been called before this point at any
time. The start code only calls it in spl_secondary_hart_stack_gd_setup().


Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
in [2] as part of a larger revert. The actual revert-of-revert is in
[3], though it really should be split out into its own patch. The
original commit making these changes is [4].


Patch series [1] makes not difference. You still have

gd->arch.ipi[0].addr= 0x0000000000000000
gd->arch.ipi[1].addr= 0x0000000000000000

and the secondary hart jumps to NULL and never returns.


Note that the situation before [4] was that the IPI got initialized by
the first hart to call any IPI function. If that hart was not the boot
hart, Bad Things started to happen (e.g. race conditions, memory
corruption, etc). In that patch, I moved the initialization to its own
function so we would not have any race conditions and instead have
(easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
course, when everything is working properly, we should get neither of
these.

[1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
[2] 
https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
[3] 
https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
[4] 
https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/

Why do we call handle_ipi() on the secondary hart at all?


Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
you're getting at.


I cannot see anything to be done by a secondary hart in case of a
software interrupt.

Best regards

Heinrich


When resetting the sipeed_maix_bitm_defconfig without the patch I see a
crash:

=> reset
resetting ...
Unhandled exception: Illegal instruction
EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
### ERROR ### Please RESET the board ###

Objdump shows that it is related to the secondary_hart_loop:

        j       secondary_hart_loop
    8000021a:   b7ed                    j       80000204
<secondary_hart_loop>

This crash is avoided with this patch.

How do we get forward?

Best regards

Heinrich

Where do you expect it to jump if U-Boot is the first binary loaded?

Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
address to secondary_hart_relocate(), do we want the secondary hart to
execute it again and again?


--Sean

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Reply via email to