On 01.09.20 13:14, Sean Anderson wrote: > On 9/1/20 6:51 AM, Heinrich Schuchardt wrote: >> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote: >>> On 11.08.20 12:32, Sean Anderson wrote: >>>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: >>>>> On 11.08.20 08:20, Rick Chen wrote: >>>>>> Hi Heinrich >>>>>> >>>>>>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] >>>>>>> Sent: Tuesday, August 11, 2020 11:57 AM >>>>>>> To: Rick Jian-Zhi Chen(陳建志) >>>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel >>>>>>> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt >>>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >>>>>>> >>>>>>> At least on the Kendryte K210: >>>>>>> >>>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr= >>>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000 >>>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr= >>>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000 >>>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000 >>>>>>> >>>>>>> We should not jump to 0x0 to handle an interrupt. >>>>>> >>>>>> Can you explain why K210 be affected by it ? >>>>> >>>>> I have been running U-Boot on the MaixDuino. >>>>> >>>>> Without this patch secondary_hart_loop() is reached only once. With the >>>>> patch it is reached several thousand times. >>>> >>>> Hm, interesting. To me, this is a symptom of something else going >>>> terribly wrong. I originally had this check in place so that it would >>>> be easier to detect these sorts of errors. I don't think this is the >>>> correct fix, however. We should really try and find the root cause of >>>> the bug. >>>> >>>>> I would not expect NULL to contain any code that should be executed by >>>>> the secondary hart. See doc/board/sipeed/maix.rst: >>>>> >>>>> Address Size Description >>>>> ========== ========= =========== >>>>> 0x00000000 0x1000 debug >>>>> 0x00001000 0x1000 rom >>>>> 0x02000000 0xC000 clint >>>>> >>>>>> >>>>>>> >>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de> >>>>>>> --- >>>>>>> arch/riscv/lib/smp.c | 2 ++ >>>>>>> 1 file changed, 2 insertions(+) >>>>>>> >>>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >>>>>>> ac22136314..d725fa32e8 100644 >>>>>>> --- a/arch/riscv/lib/smp.c >>>>>>> +++ b/arch/riscv/lib/smp.c >>>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >>>>>>> return; >>>>>>> } >>>>>>> >>>>>>> + if (!smp_function) >>>>>>> + return; >>>>>>> smp_function(hart, gd->arch.ipi[hart].arg0, >>>>>>> gd->arch.ipi[hart].arg1); } >>>>>> >>>>>> I remember Sean add this check in >>>>>> [v10,14/21] riscv: Clean up IPI initialization code >>>>>> . And I ask him to remove. >>>>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ >>>>> >>>>> Your comment was: "Please remove the sanity check. If zero address is >>>>> the intended jump point, then system will work abnormal." >>>>> >>>>> The only place where gd->arch.ipi[hart].addr is set is: >>>>> >>>>> arch/riscv/lib/smp.c:53 send_ipi_many(): >>>>> gd->arch.ipi[reg].addr = ipi->addr; >>>>> >>>>> send_ipi_many() is only called in smp_call_function(). >>>>> >>>>> So the line >>>>> >>>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); >>>>> >>>>> can only work if smp_function() has been called before this point at any >>>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). >>>> >>>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi >>>> in [2] as part of a larger revert. The actual revert-of-revert is in >>>> [3], though it really should be split out into its own patch. The >>>> original commit making these changes is [4]. >>> >>> Patch series [1] makes not difference. You still have >>> >>> gd->arch.ipi[0].addr= 0x0000000000000000 >>> gd->arch.ipi[1].addr= 0x0000000000000000 >>> >>> and the secondary hart jumps to NULL and never returns. >>> >>>> >>>> Note that the situation before [4] was that the IPI got initialized by >>>> the first hart to call any IPI function. If that hart was not the boot >>>> hart, Bad Things started to happen (e.g. race conditions, memory >>>> corruption, etc). In that patch, I moved the initialization to its own >>>> function so we would not have any race conditions and instead have >>>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of >>>> course, when everything is working properly, we should get neither of >>>> these. >>>> >>>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 >>>> [2] >>>> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ >>>> [3] >>>> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ >>>> [4] >>>> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ >>>> >>>>> Why do we call handle_ipi() on the secondary hart at all? >>>> >>>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what >>>> you're getting at. >>>> >>> >>> I cannot see anything to be done by a secondary hart in case of a >>> software interrupt. >>> >>> Best regards >>> >>> Heinrich >> >> When resetting the sipeed_maix_bitm_defconfig without the patch I see a >> crash: >> >> => reset >> resetting ... >> Unhandled exception: Illegal instruction >> EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509 >> ### ERROR ### Please RESET the board ### > > Hm, interesting. I don't see that error. What commit are you testing with?
origin/master 23e333a5c083a000d0cabc sipeed_maix_bitm_defconfig plus CONFIG_DEBUG_UART=y CONFIG_DEBUG_UART_SIFIVE=y CONFIG_DEBUG_UART_BASE=0x38000000 CONFIG_DEBUG_UART_CLOCK=390000000 on a Maixduino. Best regards Heinrich > >> >> Objdump shows that it is related to the secondary_hart_loop: >> >> j secondary_hart_loop >> 8000021a: b7ed j 80000204 >> <secondary_hart_loop> >> >> This crash is avoided with this patch. >> >> How do we get forward? >> >> Best regards >> >> Heinrich >> >>> >>>>> Where do you expect it to jump if U-Boot is the first binary loaded? >>>>> >>>>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump >>>>> address to secondary_hart_relocate(), do we want the secondary hart to >>>>> execute it again and again? >>>> >>>> --Sean >>>> >>> >> > >