Greeting folks, Similarly to a previous issue[0], we have been tracking a random bug affecting CI jobs and it seems like the root cause is finally identified.
The issue was that the provisioner config-update job was often failing with RETRY_LIMIT, and in the zuul executor logs there was: DEBUG zuul.AnsibleJob: [build: XX] Ansible output: b"bwrap: Can't get type of source /tmp/ssh-YYY/agent.ZZZ: No such file or directory" After investigating Zuul[1], then sf-ci[2] and finally bubblewrap[3], I was able to get a stable reproducer thanks to the zuul autohold feature. It turns out that the OCI/RunC driver in Nodepool had an issue[4] which was breaking bubblewrap usage[5]: due to a still unknown reason, the init playbook of the nodepool driver caused the root device to be mounted twice. The CI should be fixed by https://softwarefactory-project.io/r/12676 and then https://softwarefactory-project.io/r/12715 fixes nodepool directly. Hopefully CI jobs should be much more stable now. Cheers, -Tristan [0] https://www.redhat.com/archives/softwarefactory-dev/2016-December/msg00000.html [1] https://softwarefactory-project.io/r/11655 [2] https://softwarefactory-project.io/r/#/c/12676/2 [3] https://softwarefactory-project.io/r/12678 [4] https://softwarefactory-project.io/r/12715 [5] https://github.com/projectatomic/bubblewrap/issues/273
pgpBDWEoTtt29.pgp
Description: PGP signature
_______________________________________________ Softwarefactory-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/softwarefactory-dev
