Greeting folks,

Similarly to a previous issue[0], we have been tracking a random bug
affecting CI jobs and it seems like the root cause is finally
identified.

The issue was that the provisioner config-update job was often failing
with RETRY_LIMIT, and in the zuul executor logs there was:

DEBUG zuul.AnsibleJob: [build: XX] Ansible output:
b"bwrap: Can't get type of source /tmp/ssh-YYY/agent.ZZZ: No such file or 
directory"

After investigating Zuul[1], then sf-ci[2] and finally bubblewrap[3], I
was able to get a stable reproducer thanks to the zuul autohold feature.
It turns out that the OCI/RunC driver in Nodepool had an issue[4] which
was breaking bubblewrap usage[5]: due to a still unknown reason, the
init playbook of the nodepool driver caused the root device to be mounted
twice.

The CI should be fixed by https://softwarefactory-project.io/r/12676
and then https://softwarefactory-project.io/r/12715 fixes nodepool
directly.

Hopefully CI jobs should be much more stable now.

Cheers,
-Tristan


[0] 
https://www.redhat.com/archives/softwarefactory-dev/2016-December/msg00000.html
[1] https://softwarefactory-project.io/r/11655
[2] https://softwarefactory-project.io/r/#/c/12676/2
[3] https://softwarefactory-project.io/r/12678
[4] https://softwarefactory-project.io/r/12715
[5] https://github.com/projectatomic/bubblewrap/issues/273

Attachment: pgpBDWEoTtt29.pgp
Description: PGP signature

_______________________________________________
Softwarefactory-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/softwarefactory-dev

Reply via email to