Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Sean Anderson Tue, 01 Sep 2020 04:15:59 -0700

On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:
>> On 11.08.20 12:32, Sean Anderson wrote:
>>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>>> On 11.08.20 08:20, Rick Chen wrote:
>>>>> Hi Heinrich
>>>>>
>>>>>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
>>>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel 
>>>>>> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>>>
>>>>>> At least on the Kendryte K210:
>>>>>>
>>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>>>
>>>>>> We should not jump to 0x0 to handle an interrupt.
>>>>>
>>>>> Can you explain why K210 be affected by it ?
>>>>
>>>> I have been running U-Boot on the MaixDuino.
>>>>
>>>> Without this patch secondary_hart_loop() is reached only once. With the
>>>> patch it is reached several thousand times.
>>>
>>> Hm, interesting. To me, this is a symptom of something else going
>>> terribly wrong. I originally had this check in place so that it would
>>> be easier to detect these sorts of errors. I don't think this is the
>>> correct fix, however. We should really try and find the root cause of
>>> the bug.
>>>
>>>> I would not expect NULL to contain any code that should be executed by
>>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>>
>>>> Address    Size      Description
>>>> ========== ========= ===========
>>>> 0x00000000 0x1000    debug
>>>> 0x00001000 0x1000    rom
>>>> 0x02000000 0xC000    clint
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de>
>>>>>> ---
>>>>>>   arch/riscv/lib/smp.c | 2 ++
>>>>>>   1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
>>>>>> ac22136314..d725fa32e8 100644
>>>>>> --- a/arch/riscv/lib/smp.c
>>>>>> +++ b/arch/riscv/lib/smp.c
>>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>>>                  return;
>>>>>>          }
>>>>>>
>>>>>> +       if (!smp_function)
>>>>>> +               return;
>>>>>>          smp_function(hart, gd->arch.ipi[hart].arg0, 
>>>>>> gd->arch.ipi[hart].arg1);  }
>>>>>
>>>>> I remember Sean add this check in
>>>>> [v10,14/21] riscv: Clean up IPI initialization code
>>>>> . And I ask him to remove.
>>>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/
>>>>
>>>> Your comment was: "Please remove the sanity check. If zero address is
>>>> the intended jump point, then system will work abnormal."
>>>>
>>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>>
>>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>>
>>>> send_ipi_many() is only called in smp_call_function().
>>>>
>>>> So the line
>>>>
>>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>>
>>>> can only work if smp_function() has been called before this point at any
>>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>>
>>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>>> in [2] as part of a larger revert. The actual revert-of-revert is in
>>> [3], though it really should be split out into its own patch. The
>>> original commit making these changes is [4].
>>
>> Patch series [1] makes not difference. You still have
>>
>> gd->arch.ipi[0].addr= 0x0000000000000000
>> gd->arch.ipi[1].addr= 0x0000000000000000
>>
>> and the secondary hart jumps to NULL and never returns.
>>
>>>
>>> Note that the situation before [4] was that the IPI got initialized by
>>> the first hart to call any IPI function. If that hart was not the boot
>>> hart, Bad Things started to happen (e.g. race conditions, memory
>>> corruption, etc). In that patch, I moved the initialization to its own
>>> function so we would not have any race conditions and instead have
>>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>>> course, when everything is working properly, we should get neither of
>>> these.
>>>
>>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>>> [2] 
>>> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
>>> [3] 
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
>>> [4] 
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/
>>>
>>>> Why do we call handle_ipi() on the secondary hart at all?
>>>
>>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>>> you're getting at.
>>>
>>
>> I cannot see anything to be done by a secondary hart in case of a
>> software interrupt.
>>
>> Best regards
>>
>> Heinrich
> 
> When resetting the sipeed_maix_bitm_defconfig without the patch I see a
> crash:
> 
> => reset
> resetting ...
> Unhandled exception: Illegal instruction
> EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
> ### ERROR ### Please RESET the board ###


Hm, interesting. I don't see that error. What commit are you testing with?

> 
> Objdump shows that it is related to the secondary_hart_loop:
> 
>         j       secondary_hart_loop
>     8000021a:   b7ed                    j       80000204
> <secondary_hart_loop>
> 
> This crash is avoided with this patch.
> 
> How do we get forward?
> 
> Best regards
> 
> Heinrich
> 
>>
>>>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>>>
>>>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>>>> address to secondary_hart_relocate(), do we want the secondary hart to
>>>> execute it again and again?
>>>
>>> --Sean
>>>
>>
>

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Reply via email to