After a bit more digging and looking at other pod logs, I managed to find
some useful logs in the machine-config-daemon on one of the nodes.

The error is:

content mismatch for file
/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt:
-----BEGIN CERTIFICATE...

...certificate data...

Marking Degraded due to: unexpected on-disk state validating against
rendered-worker-987dsa987f98

When I ssh onto the node, I can see that
/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
already had the certificates that I specified via setting up additional
trusted CA's for builds
<https://docs.openshift.com/container-platform/4.2/builds/setting-up-trusted-ca.html>
instructions.
But when trying to pull an image via "sudo crictl pull
myprivate.registry:5001/image:tag", it would complain about x509
certificates not being trusted. But if I reboot the node, then pulling via
crictl starts working. However, the machine config operator remains broken
complaining about the above error.  So it seems that the certificates are
finding their way onto the node via different mechanism than the MCO.

This cluster is a disconnected cluster with some extra trusted CAs that
were configured during installation, so I'm wondering if the content
mismatch in the MCO is related to merging the CA certs for images and the
certs inside the "user-ca-bundle" configmap in the "openshift-config"
namespace

Any ideas?


On Tue, 18 Feb 2020 at 17:33, Joel Pearson <japear...@agiledigital.com.au>
wrote:

> Hi,
>
> I've been having trouble to get openshift to reliably accept CA's for
> custom secure registries:
> We've been following this guide:
> https://docs.openshift.com/container-platform/4.2/builds/setting-up-trusted-ca.html
>
> And it has worked sometimes and not others. The most frustrating bit is
> not being able to figure out when the CA certificates have been applied,
> sometimes just waiting 5 minutes is enough, other times, it never happens.
> I'm not sure what logs I need to watch so I know that it has seen it, and
> done something.
>
> This article
> <https://docs.openshift.com/container-platform/4.2/openshift_images/image-configuration.html#images-configuration-insecure_image-configuration>
> says that the machine config operator (MCO) restarts nodes to apply the
> updates, but when I watch "oc get nodes", I don't see anything restarting,
> but sometimes it seems the certificates get applied anyway, somehow.
>
> Additionally, the MCO is degraded in the cluster, and it's not clear why.
> All I have managed to find so far is timeout error messages in the MCO pod,
> and then in the MCO cluster operator status, it just says it timed out
> waiting for them to sync, and that they're all unavailable.
>
> Where do I need to look to debug any errors related to the MCO?
>
> Any help or pointers would be appreciated.
>
> Thanks,
>
> Joel
>
> <https://docs.openshift.com/container-platform/4.2/builds/setting-up-trusted-ca.html>
>
>
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to