Hi, Ok that sheds some light. Are you saying that in CI there is a box which runs N dockers at once and inside each of these there is a make test job? Are these dockers CPU pinned? If not l, then that would be my first suspect - more than one docker one the same physical CPU with the test framework doing more pinning than before could make it more prone to timing issues.
When tuning the patches I was sometimes hitting reproducible failures in CI and I think I saw one the logs mentioning CPU frequency of 0.3GHz which I found dubious, while at the same time it showed 128 available CPUs. I couldn’t understand why it uses TEST_JOBS=4. But if the CI runs N of these at once, then that would make some sense. Maybe we could try sprinting in serial instead of walking in parallel to see if patterns emerge? Thanks, Klement > On 5 Jun 2026, at 03:59, Dave Wallace via lists.fd.io > <[email protected]> wrote: > > Klement, > > The failures are not random, they are intermittent. A number of tests are > being skipped (e.g. @tag_fixme_debian12), because they have repeatedly failed > in an intermittent and un-reproducible pattern in the CI on non-related > patchsets. > > Previous investigations failed to reproduce the issue when run over 1000s of > iterations on individual servers (both bare-metal and inside the docker > executor containers used in the CI). I have long suspected that there are > 'noisy neighbor' cpu pinning issues when a large number of docker containers > running verify jobs are packed onto a single nomad client. > > For the past several months, the number of non-related intermittent job > failures have been very low since the 'usual suspects' were elided from > running on debian 12 where the majority of said failures had been occurring. > For whatever reason, the latest 'make test' changes have exacerbated the > issue. > > All of the '@tag_fixme_*' testcases which are elided from per-patch testing > represent technical debt which has been neglected for a very long time. Any > help you can provide to address this technical debt is most appreciated. > > Thanks, > -daw- > >> On 6/4/26 13:43, Klement Sekera via lists.fd.io wrote: >> Hey, >> >> Could also be this one: >> https://www.google.com/url?q=https://gerrit.fd.io/r/c/vpp/%2B/45918/5&source=gmail-imap&ust=1781229569000000&usg=AOvVaw1JJF6ehJguLiWRXzA9IjT8 >> >> These patches don’t really change test behavior, only scheduling. Before >> 45918, with TEST_JOBS > 1, the pipeline would get underutilized “randomly”, >> due to scheduling at most one test class per finished test class. So if a >> 1-cpu class followed 4-cpu class, then 3 cpus would sit idle. With this >> patch, the pipeline is refilled properly. >> >> I also noticed that any extra cpus (like for vcl tests) were unaccounted for >> - that’s what the later patch fixes. >> >> If the failures are “random”, then it means the tests are flaky and either >> need to fixed or marked for solo run as a temporary(?!) measure. I’d bet >> $.25 that these were fake-solo-run before due to pipeline underutilization. >> >> Regards, >> Klement >> >>>> On 4 Jun 2026, at 19:30, Dave Wallace via lists.fd.io >>>> <[email protected]> wrote: >>> >>> Ole/Klement, >>> >>> Can you please help triage these new intermittent / non-patch related test >>> failures? >>> >>> The frequency of intermittent/ non-patch related test failures have spiked >>> in the CI ever since Ole merged the batch of Klement's test updates in >>> gerrit. >>> >>> Here's some more that I encountered on my CI monitoring gerrit change [0]: >>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26828951273/job/79105790972%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw386qhQrZeq8JYQ1UbilDnW&source=gmail-imap&ust=1781229569000000&usg=AOvVaw3PGqqnA_ct0OGCnittWUSp >>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26916331530/job/79436593001%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw0L8Z8McTXF6BqH53pZiW28&source=gmail-imap&ust=1781229569000000&usg=AOvVaw15myZGYglbDMEVHLT5QQtV >>> >>> Thanks, >>> -daw- >>> [0] >>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://gerrit.fd.io/r/c/vpp/%252B/45941%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw2asscMzmXEXYxjjeddvXv_&source=gmail-imap&ust=1781229569000000&usg=AOvVaw0WhlBvDUXOSR6hJAlvAuzO >>> >>>> On 6/4/26 11:58, Matus Fabian -X (matfabia - PANTHEON TECHNOLOGIES@Cisco) >>>> via lists.fd.io wrote: >>>> Hi, >>>> >>>> Today I noticed excess random failures, not related to patch, of make test >>>> in CI across different jobs on a couple of patches. >>>> Some examples: >>>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26952741597/job/79522258003%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw1ALoKUFN46rZlEmQpwjojU&source=gmail-imap&ust=1781229569000000&usg=AOvVaw1kUSq4aEGtENBFXkY6zziH >>>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26935962088/job/79468040688%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw3OQN_Dos849rFGWXIxyrEB&source=gmail-imap&ust=1781229569000000&usg=AOvVaw06JPcKzQob7_JBtq_CHu78 >>>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26937415597/job/79470773670%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw31_vDtnQQNTHGlo2YfFNpn&source=gmail-imap&ust=1781229569000000&usg=AOvVaw1ZuiXoBRb0kJsLpW9_p1hR >>>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26937415597/job/79470773732%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw36UUipJ6sAUo4Fidziguv0&source=gmail-imap&ust=1781229569000000&usg=AOvVaw3rVFv-PKak8iSD1FvK00eq >>>> >>>> Regards, >>>> Matus >>>> >>>> >>>> >>> >>> >>> >> >> > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#27041): https://lists.fd.io/g/vpp-dev/message/27041 Mute This Topic: https://lists.fd.io/mt/119648437/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
