On 10/26/2016 02:07 AM, André Przywara wrote:
On 25/10/16 08:52, Alexander Graf wrote:

Hi Alex,

thanks for looking at this!

On 25/10/2016 02:51, Andre Przywara wrote:
On systems using the generic timer routines defined in lib/time.c we
use timebase_l and timebase_h fields from the gd to detect wraparounds
in our tick counter. The tick calculcation algorithm silently assumes
that a long is only 32 bits, which leads to wrong results when timebase_h
is not 0 on 64-bit systems.
As one possible fix lets initialize timebase_h (and timebase_l) to 0, so
on 64-bit systems timebase_h will never(TM) be bigger than 0 and thus
cannot spoil timebase_l by being ORed into it.

This fixes occasional timeout issues on the Pine64 (and possibly other
ARM64 systems).
Well, not really (this fix isn't sufficient), but read on ...

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Hi,

I am bit puzzled what the proper fix is, this one is the easiest I could
come up with. Not sure if the gd should be zeroed normally (and it's just
broken on sunxi/arm64 because of some linker issues) or whether we really
forgot to initialize those fields and just got away with it.
The gd clearing happens via crt0_64.S -> board_init_f_init_reserve().
There we should have fully cleared all memory associated with global data.
Ah,thanks for pointing to that. I was a bit clueless where to start
looking for it - "git grep gd" is obviously not a good idea ;-)

I can't see anything obviously wrong in that code. Could you try to dump
gd if the timer offsets are != 0 on init? Maybe we can conclude
something from the data in it.
So I agree that this code looks sane and indeed in all my dumps it looks
like the initialization works fine.
I did some more debugging and learned that the increased timebase_h
comes from detected overflows: in fact some readings are really lower
than previous ones:
...
MMC:   SUNXI SD/MMC: 0
get_ticks() overflow: now: 118046720, tbl: 118063103, tbh: 0
get_ticks() overflow: now: 118439936, tbl: 118456318, tbh: 1
get_ticks() overflow: now: 118571008, tbl: 118587391, tbh: 2
get_ticks() overflow: now: 118734848, tbl: 118751230, tbh: 3
get_ticks() overflow: now: 119422976, tbl: 119439358, tbh: 4
get_ticks() overflow: now: 119783424, tbl: 119799807, tbh: 5
get_ticks() overflow: now: 120045568, tbl: 120061950, tbh: 6
get_ticks() overflow: now: 120406016, tbl: 120422398, tbh: 7
*** Warning - bad CRC, using default environment
......
Not sure how this actually happens - I am not aware of any such severe
hardware errata in the A53 r0p4 or the timer, also I think that would
have bitten us elsewhere already.
As ATF keeps the secondaries in WFI, U-Boot only runs on CPU0 (confirmed
by MPIDR reads).
Also U-Boot reads the physical counter, so a bogus CNTVOFF can also not
be the culprit.

So I can fix this annoying issue by using one of the other proposed
fixes (handling timebase_h only if BITS_PER_LONG < 64 or defining
get_ticks in armv8/generic_timer.c), but it would still be interesting
to find the real root cause.

Can you try and ask around? If this bites us in U-Boot, there's a good chance Linux timers should be broken too, no?

I remember that NXP had similar problems with the timer:

  https://patchwork.kernel.org/patch/9344727/


Alex

_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to