On 2/27/26 18:39, Tom Rini wrote:
> On Fri, Feb 27, 2026 at 11:39:44AM +0100, Emanuele Ghidoli wrote:
>>
>>
>> On 2/27/26 11:13, Francis, Neha wrote:
>>>
>>>
>>> On 2/26/2026 10:01 PM, Tom Rini wrote:
>>>> On Thu, Feb 26, 2026 at 05:30:06PM +0100, Stefan Eichenberger wrote:
>>>>> Hi Francesco and Tom,
>>>>>
>>>>> On Thu, Feb 26, 2026 at 05:11:49PM +0100, Francesco Dolcini wrote:
>>>>>> +Emanuele
>>>>>>
>>>>>> Hello Tom,
>>>>>>
>>>>>> On Thu, Feb 26, 2026 at 08:23:45AM -0600, Tom Rini wrote:
>>>>>>> On Thu, Feb 26, 2026 at 08:05:02AM +0100, Francesco Dolcini wrote:
>>>>>>>> Hello Tom,
>>>>>>>>
>>>>>>>> On Fri, Mar 14, 2025 at 11:06:49AM +0100, Stefan Eichenberger wrote:
>>>>>>>>> From: Stefan Eichenberger <[email protected]>
>>>>>>>>>
>>>>>>>>> The get_ram_size() function fails to restore the original RAM data
>>>>>>>>> when
>>>>>>>>> the data cache is enabled. This issue was observed on an AM625 R5 SPL
>>>>>>>>> with 512MB of RAM and is a regression that became visible with
>>>>>>>>> commit bc07851897bd ("board: ti: Pull redundant DDR functions to a
>>>>>>>>> common
>>>>>>>>> location and Fixup DDR size when ECC is enabled").
>>>>>>>>>
>>>>>>>>> Observed boot failure messages:
>>>>>>>>> Warning: Did not detect image signing certificate. Skipping
>>>>>>>>> authentication to prevent boot failure. This will fail on Security
>>>>>>>>> Enforcing(HS-SE) devices
>>>>>>>>> Authentication passed
>>>>>>>>> Starting ATF on ARM64 core...
>>>>>>>>>
>>>>>>>>> The system then hangs. This indicates that without a data cache flush,
>>>>>>>>> data in the cache is not coherent with RAM, preventing the system from
>>>>>>>>> booting. This was verified by printing the content of this address
>>>>>>>>> when
>>>>>>>>> the issue occurs.
>>>>>>>>>
>>>>>>>>> Add a data cache flush after each restore operation to resolve this
>>>>>>>>> issue.
>>>>>>>>>
>>>>>>>>> Fixes: bc07851897bd ("board: ti: Pull redundant DDR functions to a
>>>>>>>>> common location and Fixup DDR size when ECC is enabled")
>>>>>>>>> Fixes: 1c64b98c1ec4 ("common/memsize.c: Fix get_ram_size() when cache
>>>>>>>>> is enabled")
>>>>>>>>> Signed-off-by: Stefan Eichenberger <[email protected]>
>>>>>>>>
>>>>>>>> Tom, can we merge this?
>>>>>>>> This is the last bit to solve the regression reported here,
>>>>>>>> https://lore.kernel.org/all/20260224152405.GD340942@francesco-nb/
>>>>>>>
>>>>>>> I wasn't happy with this at the time, and Stefan's last email in the
>>>>>>> thread left me with the impression more investigation was needed and
>>>>>>> likely something else was the root cause.
>>>>>>
>>>>>> I believe that this patch is needed.
>>>>>>
>>>>>> On AM62 what is happening is the following.
>>>>>>
>>>>>> We have a cortex-R5 that is the first core booting (there is also a
>>>>>> cortex-m4, but it's not relevant for this discussion).
>>>>>>
>>>>>> It runs from internal memory and it configures the DDR ram
>>>>>>
>>>>>> We load to DDR memory various pieces of firmware (TFA, U-Boot for the
>>>>>> cortex A53, ...)
>>>>>>
>>>>>> We do execute get_ram_size(), that read/write the memory, and it is
>>>>>> supposed to restore it back the original content
>>>>>>
>>>>>> However when we have the cache enabled, we might miss to write back the
>>>>>> original memory content, where the other pieces of firmware are.
>>>>>>
>>>>>> And after that we start the cortex A53, running in DDR, and there the
>>>>>> memory content might not be correct, because there is no cache coherency
>>>>>> between the cortex-A and the cortex-R. And because of that we have
>>>>>> crashes.
>>>>>>
>>>>>> Stefan: any comment here? Can you help?
>>>>>
>>>>> I think what you wrote summarises the issue well. If I recall correctly,
>>>>> I "fixed" the issue last time by simply calling get_ram_size() once
>>>>> before enabling the cache. This was in commit 4164289db882e. The SPL
>>>>> then informs U-Boot of the memory size via fdt fixup. However, something
>>>>> has probably changed now (possibly in the R5 SPL), meaning the cache is
>>>>> enabled earlier, so the cache is enabled again when get_ram_size() is
>>>>> called.
>>>>>
>>>>> For the AMP use case, either "get_ram_size" should not be called once
>>>>> the cache is enabled, or a similar patch to the one I proposed is
>>>>> required.
>>>>
>>>> I would lean towards the former if at all possible.
>>>>
>>>
>>> Just trying to understand, what is the reasoning behind ensuring
>>> get_ram_size is
>>> not called if cache is not enabled? Wasn't get_ram_size written with the
>>> possibility of cache being enabled (existence of dcache_en logic); then this
>>> patch is a valid fix right?
>>>
>>> In parallel, I do agree we need to have a code analysis w.r.t dram_init, we
>>> are
>>> making certain cache and dram calls spuriously making this confusing.
>>>
>>
>> Hello Tom,
>> I agree with Francis.
>>
>> When I proposed commit 1c64b98c1ec4 ("common/memsize.c: Fix get_ram_size()
>> when cache is enabled"), I was not considering the presence of other actors
>> (other cores, DMA engines, etc.).
>>
>> That patch fixes what I had overlooked at the time. We need to restore the
>> actual RAM contents, not only what is perceived by the core executing
>> get_ram_size().
>>
>> To me this patch sounds intrinsically correct.
>
> Alright. Can I please get some Reviewed / Tested by tags? Thanks.
>
Reviewed-by: Emanuele Ghidoli <[email protected]>