On 2/26/2026 10:01 PM, Tom Rini wrote:
> On Thu, Feb 26, 2026 at 05:30:06PM +0100, Stefan Eichenberger wrote:
>> Hi Francesco and Tom,
>>
>> On Thu, Feb 26, 2026 at 05:11:49PM +0100, Francesco Dolcini wrote:
>>> +Emanuele
>>>
>>> Hello Tom,
>>>
>>> On Thu, Feb 26, 2026 at 08:23:45AM -0600, Tom Rini wrote:
>>>> On Thu, Feb 26, 2026 at 08:05:02AM +0100, Francesco Dolcini wrote:
>>>>> Hello Tom,
>>>>>
>>>>> On Fri, Mar 14, 2025 at 11:06:49AM +0100, Stefan Eichenberger wrote:
>>>>>> From: Stefan Eichenberger <[email protected]>
>>>>>>
>>>>>> The get_ram_size() function fails to restore the original RAM data when
>>>>>> the data cache is enabled. This issue was observed on an AM625 R5 SPL
>>>>>> with 512MB of RAM and is a regression that became visible with
>>>>>> commit bc07851897bd ("board: ti: Pull redundant DDR functions to a common
>>>>>> location and Fixup DDR size when ECC is enabled").
>>>>>>
>>>>>> Observed boot failure messages:
>>>>>>   Warning: Did not detect image signing certificate. Skipping 
>>>>>> authentication to prevent boot failure. This will fail on Security 
>>>>>> Enforcing(HS-SE) devices
>>>>>>   Authentication passed
>>>>>>   Starting ATF on ARM64 core...
>>>>>>
>>>>>> The system then hangs. This indicates that without a data cache flush,
>>>>>> data in the cache is not coherent with RAM, preventing the system from
>>>>>> booting. This was verified by printing the content of this address when
>>>>>> the issue occurs.
>>>>>>
>>>>>> Add a data cache flush after each restore operation to resolve this
>>>>>> issue.
>>>>>>
>>>>>> Fixes: bc07851897bd ("board: ti: Pull redundant DDR functions to a 
>>>>>> common location and Fixup DDR size when ECC is enabled")
>>>>>> Fixes: 1c64b98c1ec4 ("common/memsize.c: Fix get_ram_size() when cache is 
>>>>>> enabled")
>>>>>> Signed-off-by: Stefan Eichenberger <[email protected]>
>>>>>
>>>>> Tom, can we merge this?
>>>>> This is the last bit to solve the regression reported here,
>>>>> https://lore.kernel.org/all/20260224152405.GD340942@francesco-nb/
>>>>
>>>> I wasn't happy with this at the time, and Stefan's last email in the
>>>> thread left me with the impression more investigation was needed and
>>>> likely something else was the root cause.
>>>
>>> I believe that this patch is needed.
>>>
>>> On AM62 what is happening is the following.
>>>
>>> We have a cortex-R5 that is the first core booting (there is also a
>>> cortex-m4, but it's not relevant for this discussion).
>>>
>>> It runs from internal memory and it configures the DDR ram
>>>
>>> We load to DDR memory various pieces of firmware (TFA, U-Boot for the
>>> cortex A53, ...)
>>>
>>> We do execute get_ram_size(), that read/write the memory, and it is
>>> supposed to restore it back the original content
>>>
>>> However when we have the cache enabled, we might miss to write back the
>>> original memory content, where the other pieces of firmware are.
>>>
>>> And after that we start the cortex A53, running in DDR, and there the
>>> memory content might not be correct, because there is no cache coherency
>>> between the cortex-A and the cortex-R. And because of that we have
>>> crashes.
>>>
>>> Stefan: any comment here? Can you help?
>>
>> I think what you wrote summarises the issue well. If I recall correctly,
>> I "fixed" the issue last time by simply calling get_ram_size() once
>> before enabling the cache. This was in commit 4164289db882e. The SPL
>> then informs U-Boot of the memory size via fdt fixup. However, something
>> has probably changed now (possibly in the R5 SPL), meaning the cache is
>> enabled earlier, so the cache is enabled again when get_ram_size() is
>> called.
>>
>> For the AMP use case, either "get_ram_size" should not be called once
>> the cache is enabled, or a similar patch to the one I proposed is
>> required.
> 
> I would lean towards the former if at all possible.
> 

Just trying to understand, what is the reasoning behind ensuring get_ram_size is
not called if cache is not enabled? Wasn't get_ram_size written with the
possibility of cache being enabled (existence of dcache_en logic); then this
patch is a valid fix right?

In parallel, I do agree we need to have a code analysis w.r.t dram_init, we are
making certain cache and dram calls spuriously making this confusing.

-- 
Thanking You
Neha Malcom Francis

Reply via email to