Hello everyone.

I am working on automating "git bisect" process
for locating mainly performance regressions and progressions,
(also usable for locating breakages and fixes).

Of course, the process works correctly
only if the performance results are stable enough.
And we know from the per-patch perf VPP verify job
that many testcases are not reliable enough.
Instead of code quality only, something else
is also influencing the results, and repeated trials
do not seem independent enough for usual statistical methods to work.

While testing the "bisection script" prototype,
I have noticed that the results from Denverton platform
show smaller spread in trial results,
enabling the script to locate smaller regressions reliably.

But when I tried ip4base testcase (2node testbed, meaning one
traffic generator and one VPP on physical machine),
I found very surprising results.
Full log is [0], but here are results (MRR,
number of 64B packets received in 1 second under line rate load)
of 60 trials run back-to-back on the same VPP build and testbed:

[6058929.0, 6044436.0, 6051129.0, 6070631.0, 6056215.0, 6061268.0,
 6057762.0, 6059699.0, 6063921.0, 6066904.0, 6051627.0, 6055370.0,
 6047920.0, 6069624.0, 6054088.0, 6055737.0, 6047438.0, 6047390.0,
 6060160.0, 6052960.0, 6056360.0, 6055028.0, 6045457.0, 6060301.0,
 6058869.0, 6059033.0, 6059880.0, 9712980.0, 8810073.0, 6050160.0,
 6063784.0, 6057699.0, 6061905.0, 6059174.0, 6061494.0, 6057585.0,
 6043699.0, 6045381.0, 6048290.0, 6051779.0, 9009111.0, 8817494.0,
 8847234.0, 7014022.0, 7385958.0, 10867843.0, 10991701.0, 10926844.0,
 10971236.0, 6056055.0, 6048881.0, 6059600.0, 6037948.0, 6047664.0,
 6057797.0, 6053424.0, 6057050.0, 6044720.0, 6042256.0, 6054110.0]

The ~6.05 Mpps values are consistent and usually seen on each VPP build,
but the red values can reach almost double of that.
Such values are quite rare.
The VPP build which got the red values
has come from a change which only adds a "make test" test;
so I believe earlier builds can also do that,
60 was just not enough tries to make the red values happen.

Has anybody seen similar results?
Has anybody got an idea of what could be happening inside VPP?
Can we fix VPP to be more consistent (ideally on the higher performance)?

Vratko.

[0] 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-perf-master-2n-dnv/6/console.log.gz
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14543): https://lists.fd.io/g/vpp-dev/message/14543
Mute This Topic: https://lists.fd.io/mt/47503893/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to