On 11.08.20 12:32, Sean Anderson wrote: > On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: >> On 11.08.20 08:20, Rick Chen wrote: >>> Hi Heinrich >>> >>>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] >>>> Sent: Tuesday, August 11, 2020 11:57 AM >>>> To: Rick Jian-Zhi Chen(陳建志) >>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel >>>> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt >>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >>>> >>>> At least on the Kendryte K210: >>>> >>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr= >>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000 >>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr= >>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000 >>>> gd->arch.ipi[1].arg1= 0x0000000000000000 >>>> >>>> We should not jump to 0x0 to handle an interrupt. >>> >>> Can you explain why K210 be affected by it ? >> >> I have been running U-Boot on the MaixDuino. >> >> Without this patch secondary_hart_loop() is reached only once. With the >> patch it is reached several thousand times. > > Hm, interesting. To me, this is a symptom of something else going > terribly wrong. I originally had this check in place so that it would > be easier to detect these sorts of errors. I don't think this is the > correct fix, however. We should really try and find the root cause of > the bug. > >> I would not expect NULL to contain any code that should be executed by >> the secondary hart. See doc/board/sipeed/maix.rst: >> >> Address Size Description >> ========== ========= =========== >> 0x00000000 0x1000 debug >> 0x00001000 0x1000 rom >> 0x02000000 0xC000 clint >> >>> >>>> >>>> Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de> >>>> --- >>>> arch/riscv/lib/smp.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >>>> ac22136314..d725fa32e8 100644 >>>> --- a/arch/riscv/lib/smp.c >>>> +++ b/arch/riscv/lib/smp.c >>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >>>> return; >>>> } >>>> >>>> + if (!smp_function) >>>> + return; >>>> smp_function(hart, gd->arch.ipi[hart].arg0, >>>> gd->arch.ipi[hart].arg1); } >>> >>> I remember Sean add this check in >>> [v10,14/21] riscv: Clean up IPI initialization code >>> . And I ask him to remove. >>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ >> >> Your comment was: "Please remove the sanity check. If zero address is >> the intended jump point, then system will work abnormal." >> >> The only place where gd->arch.ipi[hart].addr is set is: >> >> arch/riscv/lib/smp.c:53 send_ipi_many(): >> gd->arch.ipi[reg].addr = ipi->addr; >> >> send_ipi_many() is only called in smp_call_function(). >> >> So the line >> >> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); >> >> can only work if smp_function() has been called before this point at any >> time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). > > Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi > in [2] as part of a larger revert. The actual revert-of-revert is in > [3], though it really should be split out into its own patch. The > original commit making these changes is [4].
Patch series [1] makes not difference. You still have gd->arch.ipi[0].addr= 0x0000000000000000 gd->arch.ipi[1].addr= 0x0000000000000000 and the secondary hart jumps to NULL and never returns. > > Note that the situation before [4] was that the IPI got initialized by > the first hart to call any IPI function. If that hart was not the boot > hart, Bad Things started to happen (e.g. race conditions, memory > corruption, etc). In that patch, I moved the initialization to its own > function so we would not have any race conditions and instead have > (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of > course, when everything is working properly, we should get neither of > these. > > [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 > [2] > https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ > [3] > https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ > [4] > https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ > >> Why do we call handle_ipi() on the secondary hart at all? > > Presumably to handle the IPI it got sent? Sorry, I'm confused as to what > you're getting at. > I cannot see anything to be done by a secondary hart in case of a software interrupt. Best regards Heinrich >> Where do you expect it to jump if U-Boot is the first binary loaded? >> >> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump >> address to secondary_hart_relocate(), do we want the secondary hart to >> execute it again and again? > > --Sean >