I've done some tinkering locally and saw mainly quic/vcl failures. With https://gerrit.fd.io/r/c/vpp/+/46003 these seem to go away, but a latent l2bd issue surfaces, which https://gerrit.fd.io/r/c/vpp/+/46004 irons over, though it's not a real fix.
Anyhow, I was able to pass the whole suite 4 times in a row before my ssh session dropped and I didn't bother with more reruns. But it looks like an improvement nevertheless. Cheers, Klement On Fri, Jun 5, 2026 at 7:02 AM Klement Sekera via lists.fd.io <ksekera= [email protected]> wrote: > Hi, > > Ok that sheds some light. Are you saying that in CI there is a box which > runs N dockers at once and inside each of these there is a make test job? > Are these dockers CPU pinned? If not l, then that would be my first suspect > - more than one docker one the same physical CPU with the test framework > doing more pinning than before could make it more prone to timing issues. > > When tuning the patches I was sometimes hitting reproducible failures in > CI and I think I saw one the logs mentioning CPU frequency of 0.3GHz which > I found dubious, while at the same time it showed 128 available CPUs. I > couldn’t understand why it uses TEST_JOBS=4. But if the CI runs N of these > at once, then that would make some sense. Maybe we could try sprinting in > serial instead of walking in parallel to see if patterns emerge? > > Thanks, > Klement > > > On 5 Jun 2026, at 03:59, Dave Wallace via lists.fd.io <dwallacelf= > [email protected]> wrote: > > > > Klement, > > > > The failures are not random, they are intermittent. A number of tests > are being skipped (e.g. @tag_fixme_debian12), because they have repeatedly > failed in an intermittent and un-reproducible pattern in the CI on > non-related patchsets. > > > > Previous investigations failed to reproduce the issue when run over > 1000s of iterations on individual servers (both bare-metal and inside the > docker executor containers used in the CI). I have long suspected that > there are 'noisy neighbor' cpu pinning issues when a large number of docker > containers running verify jobs are packed onto a single nomad client. > > > > For the past several months, the number of non-related intermittent job > failures have been very low since the 'usual suspects' were elided from > running on debian 12 where the majority of said failures had been > occurring. For whatever reason, the latest 'make test' changes have > exacerbated the issue. > > > > All of the '@tag_fixme_*' testcases which are elided from per-patch > testing represent technical debt which has been neglected for a very long > time. Any help you can provide to address this technical debt is most > appreciated. > > > > Thanks, > > -daw- > > > >> On 6/4/26 13:43, Klement Sekera via lists.fd.io wrote: > >> Hey, > >> > >> Could also be this one: > https://www.google.com/url?q=https://gerrit.fd.io/r/c/vpp/%2B/45918/5&source=gmail-imap&ust=1781229569000000&usg=AOvVaw1JJF6ehJguLiWRXzA9IjT8 > >> > >> These patches don’t really change test behavior, only scheduling. > Before 45918, with TEST_JOBS > 1, the pipeline would get underutilized > “randomly”, due to scheduling at most one test class per finished test > class. So if a 1-cpu class followed 4-cpu class, then 3 cpus would sit > idle. With this patch, the pipeline is refilled properly. > >> > >> I also noticed that any extra cpus (like for vcl tests) were > unaccounted for - that’s what the later patch fixes. > >> > >> If the failures are “random”, then it means the tests are flaky and > either need to fixed or marked for solo run as a temporary(?!) measure. I’d > bet $.25 that these were fake-solo-run before due to pipeline > underutilization. > >> > >> Regards, > >> Klement > >> > >>>> On 4 Jun 2026, at 19:30, Dave Wallace via lists.fd.io <dwallacelf= > [email protected]> wrote: > >>> > >>> Ole/Klement, > >>> > >>> Can you please help triage these new intermittent / non-patch related > test failures? > >>> > >>> The frequency of intermittent/ non-patch related test failures have > spiked in the CI ever since Ole merged the batch of Klement's test updates > in gerrit. > >>> > >>> Here's some more that I encountered on my CI monitoring gerrit change > [0]: > >>> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26828951273/job/79105790972%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw386qhQrZeq8JYQ1UbilDnW&source=gmail-imap&ust=1781229569000000&usg=AOvVaw3PGqqnA_ct0OGCnittWUSp > >>> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26916331530/job/79436593001%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw0L8Z8McTXF6BqH53pZiW28&source=gmail-imap&ust=1781229569000000&usg=AOvVaw15myZGYglbDMEVHLT5QQtV > >>> > >>> Thanks, > >>> -daw- > >>> [0] > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://gerrit.fd.io/r/c/vpp/%252B/45941%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw2asscMzmXEXYxjjeddvXv_&source=gmail-imap&ust=1781229569000000&usg=AOvVaw0WhlBvDUXOSR6hJAlvAuzO > >>> > >>>> On 6/4/26 11:58, Matus Fabian -X (matfabia - PANTHEON > TECHNOLOGIES@Cisco) via lists.fd.io wrote: > >>>> Hi, > >>>> > >>>> Today I noticed excess random failures, not related to patch, of make > test in CI across different jobs on a couple of patches. > >>>> Some examples: > >>>> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26952741597/job/79522258003%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw1ALoKUFN46rZlEmQpwjojU&source=gmail-imap&ust=1781229569000000&usg=AOvVaw1kUSq4aEGtENBFXkY6zziH > >>>> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26935962088/job/79468040688%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw3OQN_Dos849rFGWXIxyrEB&source=gmail-imap&ust=1781229569000000&usg=AOvVaw06JPcKzQob7_JBtq_CHu78 > >>>> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26937415597/job/79470773670%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw31_vDtnQQNTHGlo2YfFNpn&source=gmail-imap&ust=1781229569000000&usg=AOvVaw1ZuiXoBRb0kJsLpW9_p1hR > >>>> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/FDio/vpp/actions/runs/26937415597/job/79470773732%26source%3Dgmail-imap%26ust%3D1781199056000000%26usg%3DAOvVaw36UUipJ6sAUo4Fidziguv0&source=gmail-imap&ust=1781229569000000&usg=AOvVaw3rVFv-PKak8iSD1FvK00eq > >>>> > >>>> Regards, > >>>> Matus > >>>> > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#27042): https://lists.fd.io/g/vpp-dev/message/27042 Mute This Topic: https://lists.fd.io/mt/119648437/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
