Hi Tom, On Wed, 12 Jul 2023 at 14:38, Tom Rini <tr...@konsulko.com> wrote: > > On Wed, Jul 12, 2023 at 02:32:18PM -0600, Simon Glass wrote: > > Hi Tom, > > > > On Wed, 12 Jul 2023 at 11:09, Tom Rini <tr...@konsulko.com> wrote: > > > > > > On Wed, Jul 12, 2023 at 08:00:23AM -0600, Simon Glass wrote: > > > > Hi Tom, > > > > > > > > On Tue, 11 Jul 2023 at 20:33, Tom Rini <tr...@konsulko.com> wrote: > > > > > > > > > > It is not uncommon for some of the QEMU-based jobs to fail not because > > > > > of a code issue but rather because of a timing issue or similar > > > > > problem > > > > > that is out of our control. Make use of the keywords that Azure and > > > > > GitLab provide so that we will automatically re-run these when they > > > > > fail > > > > > 2 times. If they fail that often it is likely we have found a real > > > > > issue > > > > > to investigate. > > > > > > > > > > Signed-off-by: Tom Rini <tr...@konsulko.com> > > > > > --- > > > > > .azure-pipelines.yml | 1 + > > > > > .gitlab-ci.yml | 1 + > > > > > 2 files changed, 2 insertions(+) > > > > > > > > This seems like a slippery slope. Do we know why things fail? I wonder > > > > if we should disable the tests / builders instead, until it can be > > > > corrected? > > > > > > It happens in Azure, so it's not just the broken runner problem we have > > > in GitLab. And the problem is timing, as I said in the commit. > > > Sometimes we still get the RTC test failing. Other times we don't get > > > QEMU + U-Boot spawned in time (most often m68k, but sometimes x86). > > > > How do we keep this list from growing? > > Do we need to? The problem is in essence since we rely on free > resources, sometimes some heavy lifts take longer. That's what this > flag is for.
I'm fairly sure the RTC thing could be made deterministic. The spawning thing...is there a timeout for that? What actually fails? > > > > > I'll note that we don't have this problem with sandbox tests. > > > > > > OK, but that's not relevant? > > > > It is relevant to the discussion about using QEMU instead of sandbox, > > e.g. with the TPM. I recall a discussion with Ilias a while back. > > I'm sure we could make sandbox take too long to start as well, if enough > other things are going on with the system. And sandbox has its own set > of super frustrating issues instead, so I don't think this is a great > argument to have right here (I have to run it in docker, to get around > some application version requirements and exclude event_dump, bootmgr, > abootimg and gpt tests, which could otherwise run, but fail for me). I haven't heard about this before. Is there anything that could be done? Regards. Simon