As suspected, here is a case[3] where the timing was unlucky, but in this case the same job voted twice so it was not a problem.

VPP Committers, please carefully review patches that have "Verfied -1" followed by "Verified +1" without a patch upload or recheck.

This a case where the "Verified -1" was due to an "Error cloning remote repo" failure with a job status of "NOTBUILT" and all other jobs passed. The retry of that job changed the vote to "Verified +1". For this patch there was no problem because it was the same job voting twice, but it could have been a different job that failed and was overridden.

I will remove naginator as soon as the connection reset issue has been resolved.

Thanks,
-daw-

[3] https://gerrit.fd.io/r/c/vpp/+/32167

On 5/4/2021 2:12 PM, Dave Wallace via lists.fd.io wrote:
Here is a case where the process worked as desired. The job which failed [0] was retried [1] after 5 seconds and passed upon retry.  It did not disrupt the voting for the patch [2] :)

Hopefully this will always be the case. The job failure did not show up in the gerrit log which I think is different from past behavior.  However, based on previous naginator induced voting irregularities, I suspect that it may cause voting anomalies if the timing is unlucky.

I will continue to monitor for connection resets and voting anomalies, but so far so good.

Thanks,
-daw-

[0] https://jenkins.fd.io/job/vpp-verify-master-centos8-x86_64/3335/
[1] https://jenkins.fd.io/job/vpp-verify-master-centos8-x86_64/3336/
[2] https://gerrit.fd.io/r/c/vpp/+/32206

On 5/4/2021 1:26 PM, Dave Wallace via lists.fd.io wrote:
Folks,

As a temporary measure to help alleviate the burden of rechecking gerrit changes and wasting cycles re-running jobs which have already passed, I have deployed the Naginator Jenkins plugin configuration to retry VPP jobs which fail with the error signature "Error cloning remote repo" [0].

You may recall there is a potential for jobs which are restarted after a failed job to override a -1 vote by a job which failed prior to the restart.

Please look for this when reviewing the status of gerrit changes prior to merge as this could break the CI pipeline.

Vexxhost is monitoring the network segments using tcpdump to determine what device is causing the TCP connection resets and once the issue is resolved I will remove the Naginator configuration from the VPP job configuration.

I will continue to closely monitoring job status for connection resets and also look for any Naginator induced issues with job retries.

Thanks,
-daw-

[0] https://gerrit.fd.io/r/c/ci-management/+/32197






-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19328): https://lists.fd.io/g/vpp-dev/message/19328
Mute This Topic: https://lists.fd.io/mt/82583361/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to