On 18/06/2024 23.03, Tim Harvey wrote:
> On Tue, Jun 18, 2024 at 7:32 AM Tom Rini <tr...@konsulko.com> wrote:
>>

> Stefan and Tom,
> 
> I'm seeing CI issues here even with 5000us [1]:
> host bind 0 /tmp/sandbox/persistent-cyclic function wdt-gpio-level
> took too long: 5368us vs 5000us max

Yes, 5ms is way too little when you're not the only thing running on the
cpu, which is why I went with 100ms.

Random thoughts and questions:

(1) Do we have any way to programmatically grab all the logs from azure
CI, so we can get some kind of objective statistics on the number after
"took too long:". Clicking through the web interface and randomly
searching is too painful.

It would also be helpful to know what percentage of CI runs have failed
due to that, versus due to some genuine error.

(2) I considered a patch that just added a

  default $something big if SANDBOX

to config CYCLIC_MAX_CPU_TIME_US, but since the problem also hit qemu, I
dropped that. But, if my patch is too ugly (and I might tend to think
that myself...), perhaps at least this would be an added improvement
over the generic bump to 5000us.

(3) I also thought that perhaps for sandbox, we should simply measure
the time using clock_gettime(CLOCK_PROCESS_CPUTIME_ID), instead of
wallclock time. But it's a little ugly to implement since the "now"
variable is both used to decide if its time to run the callback, and as
a starting point for measuring cpu time, and we probably still want the
"is it time" to be measured on wallclock and not however much cpu-time
the u-boot process has been given. Or maybe we don't, and
CLOCK_PROCESS_CPUTIME_ID would simply be a better backend for
os_get_nsec(). Sure, time in the sandbox would progress slower than on
the host, but does that actually matter?

(4) Btw., what kind of clock tick do we even get when run under qemu? I
don't have much experience with qemu, but from quick googling it seems
that -icount would be interesting. Also see
https://github.com/zephyrproject-rtos/zephyr/issues/14173 . From quick
reading it seems there were some issues back in 2019, but that today it
mostly works for them, except some SMP issues (that are certainly not
relevant to U-Boot).

The current situation is a frustrating waste of developer and maintainer
time and CI resources.

Rasmus

Reply via email to