Klement,

The failures are not random, they are intermittent.  A number of tests are being skipped (e.g. @tag_fixme_debian12), because they have repeatedly failed in an intermittent and un-reproducible pattern in the CI on non-related patchsets.

Previous investigations failed to reproduce the issue when run over 1000s of iterations on individual servers (both bare-metal and inside the docker executor containers used in the CI).  I have long suspected that there are 'noisy neighbor' cpu pinning issues when a large number of docker containers running verify jobs are packed onto a single nomad client.

For the past several months, the number of non-related intermittent job failures have been very low since the 'usual suspects' were elided from running on debian 12 where the majority of said failures had been occurring.  For whatever reason, the latest 'make test' changes have exacerbated the issue.

All of the '@tag_fixme_*' testcases which are elided from per-patch testing represent technical debt which has been neglected for a very long time.  Any help you can provide to address this technical debt is most appreciated.

Thanks,
-daw-

On 6/4/26 13:43, Klement Sekera via lists.fd.io wrote:
Hey,

Could also be this one: https://gerrit.fd.io/r/c/vpp/+/45918/5

These patches don’t really change test behavior, only scheduling. Before 45918, 
with TEST_JOBS > 1, the pipeline would get underutilized “randomly”, due to 
scheduling at most one test class per finished test class. So if a 1-cpu class 
followed 4-cpu class, then 3 cpus would sit idle. With this patch, the pipeline is 
refilled properly.

I also noticed that any extra cpus (like for vcl tests) were unaccounted for - 
that’s what the later patch fixes.

If the failures are “random”, then it means the tests are flaky and either need 
to fixed or marked for solo run as a temporary(?!) measure. I’d bet $.25 that 
these were fake-solo-run before due to pipeline underutilization.

Regards,
Klement

On 4 Jun 2026, at 19:30, Dave Wallace via lists.fd.io 
<[email protected]> wrote:

Ole/Klement,

Can you please help triage these new intermittent / non-patch related test 
failures?

The frequency of intermittent/ non-patch related test failures have spiked in 
the CI ever since Ole merged the batch of Klement's  test updates in gerrit.

Here's some more that I encountered on my CI monitoring gerrit change [0]:
https://www.google.com/url?q=https://github.com/FDio/vpp/actions/runs/26828951273/job/79105790972&source=gmail-imap&ust=1781199056000000&usg=AOvVaw386qhQrZeq8JYQ1UbilDnW
https://www.google.com/url?q=https://github.com/FDio/vpp/actions/runs/26916331530/job/79436593001&source=gmail-imap&ust=1781199056000000&usg=AOvVaw0L8Z8McTXF6BqH53pZiW28

Thanks,
-daw-
[0]   
https://www.google.com/url?q=https://gerrit.fd.io/r/c/vpp/%2B/45941&source=gmail-imap&ust=1781199056000000&usg=AOvVaw2asscMzmXEXYxjjeddvXv_

On 6/4/26 11:58, Matus Fabian -X (matfabia - PANTHEON TECHNOLOGIES@Cisco) via 
lists.fd.io wrote:
Hi,

Today I noticed excess random failures, not related to patch, of make test in 
CI across different jobs on a couple of patches.
Some examples:
https://www.google.com/url?q=https://github.com/FDio/vpp/actions/runs/26952741597/job/79522258003&source=gmail-imap&ust=1781199056000000&usg=AOvVaw1ALoKUFN46rZlEmQpwjojU
https://www.google.com/url?q=https://github.com/FDio/vpp/actions/runs/26935962088/job/79468040688&source=gmail-imap&ust=1781199056000000&usg=AOvVaw3OQN_Dos849rFGWXIxyrEB
https://www.google.com/url?q=https://github.com/FDio/vpp/actions/runs/26937415597/job/79470773670&source=gmail-imap&ust=1781199056000000&usg=AOvVaw31_vDtnQQNTHGlo2YfFNpn
https://www.google.com/url?q=https://github.com/FDio/vpp/actions/runs/26937415597/job/79470773732&source=gmail-imap&ust=1781199056000000&usg=AOvVaw36UUipJ6sAUo4Fidziguv0

Regards,
Matus










-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#27039): https://lists.fd.io/g/vpp-dev/message/27039
Mute This Topic: https://lists.fd.io/mt/119648437/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

  • [... Matus Fabian -X (matfabia - PANTHEON TECHNOLOGIES@Cisco) via lists.fd.io
    • ... Florin Coras via lists.fd.io
    • ... Dave Wallace via lists.fd.io
      • ... Klement Sekera via lists.fd.io
        • ... Dave Wallace via lists.fd.io
          • ... Klement Sekera via lists.fd.io
            • ... Dave Wallace via lists.fd.io
              • ... Klement Sekera via lists.fd.io
                • ... Dave Wallace via lists.fd.io
                • ... Klement Sekera via lists.fd.io
          • ... Klement Sekera via lists.fd.io

Reply via email to