On 9/8/20 10:02 PM, Rick Chen wrote: > Hi Sean > >> On the K210, the prior stage bootloader does not clear IPIs. This presents >> a problem, because U-Boot up until this point assumes (with one exception) >> that IPIs are cleared when it starts. This series attempts to fix this in a >> robust manner, and fix several concurrency bugs I noticed while fixing >> these other areas. Heinrich previously submitted a patch addressing part of >> this problem in [1]. >> >> [1] >> https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.g...@gmx.de/ >> > > It sounds like that the bootloader does not deal with SMP flow well > and jump to u-boot-spl, right ? > > I have a question that why not try to fix the prior stage bootloader > to clear IPIs correctly?
Because it is a ROM :) > > Actually I have encounter a similar SMP issue like you. > Our prior stage bootloader will jump to u-boot-spl with the incorrect > mstatus and result in the SMP working abnormal in u-boot-spl. Perhaps we should just clear MIE then? I originally had a patch in this series which moved the handle_ipi code into handle_trap, and got rid of the manual checks on the interrupt. Something like secondary_hart_loop: wfi j secondary_hart_loop Of course as part of that we would need to explicitly enable and disable interrupts. Perhaps not the worst idea, but I didn't include it here because I figure the current system work OK, even if it is not what one might expect. > I mean this is an individual case, not a general case. > If we try to cover any errors which come from any prior stage bootloaders, > the SMP flow will become more and more complicated and hard to debug. Of course, if a prior bootloader doesn't hold up its end of the contract, we can be left with some awful bugs to fix. U-Boot is generally not too bad to debug, but I've had an awful time whenever some concurrency sneaks into the mix. I think it's much better to confine the (necessary) complexity to as few files as possible, so that the rest of the code can be ignorant. I think part of that is verifying that we have everything in a known state, so that when we see something unexpected, we can handle it/panic/whatever instead of silently getting a bug. --Sean