Hi, I'm trying to bootstrap a disconnected (air-gapped) 4.2 cluster using the bare metal method <https://docs.openshift.com/container-platform/4.2/installing/installing_restricted_networks/installing-restricted-networks-bare-metal.html>. It is technically vmware, but I'm following the bare metal version as our vmware cluster wasn't quite compatible with the vmware instructions.
After a few false starts I managed to get the bootstrapping to start to take place. One strange thing that happened was that it was trying to download images from "quay.io/openshift-release-dev/ocp-v4.0-art-dev" instead of the documented "quay.io/openshift-release-dev/ocp-release". I found this rather odd, and I couldn't find many references to "ocp-v4.0-art-dev" on the internet, so I'm not sure exactly where it came from. I did a "strings openshift-install | grep ocp-v4.0-art-dev" but that didn't show anything, so it's a bit of a strange one. So my image content sources ended up being: imageContentSources: - mirrors: - <bastion_host_name>:5000/<repo_name>/release source: quay.io/openshift-release-dev/ocp-release - mirrors: - <bastion_host_name>:5000/<repo_name>/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - mirrors: - <bastion_host_name>:5000/<repo_name>/release source: registry.svc.ci.openshift.org/ocp/release I was watching the journalctl on the bootstrap server, and I saw each etcd server join one by one, then once they had all joined, then the apiserver on the bootstrap server seemed to lockup, when I tried to connect to https://localhost:6443 the connections would hang. Initially, I thought this meant that bootstrap had completed, but then I noticed that none of the master nodes were listing on 6443, they were all trying to look themselves up in etcd at "api-int.<cluster_name>.<base_domain>" but nothing was listening. I then scoured the journal on the bootstrap node, but I struggled to find logs related to why the apiserver had disappeared. The journal was mostly full of the bootstrap node trying to connect to https://localhost:6443, which suggested to me that bootstrap was not yet complete. I tried rebooting the bootstrap node, but I think that made it worse, it seemed to be in a crash loop whinging about files in /etc/kubernetes already existing or something like that. I had a look through /var/logs and found this error message in some pod logs: exiting because of error: log: unable to create log: open /var/log/bootstrap-control-plane/kube-apiserver.log: permission denied I'm not sure if that error is because I restarted before bootstrap was successful, or if that is actually some sort of problem. I tried reinstalling from scratch a few times, and it always got stuck in the same place, so it doesn't seem to be transient. Where can I look for errors? Is "ocp-v4.0-art-dev" an indication of a problem? Since it's an air-gapped solution it's difficult to get logs out of the system, so I don't know if I'll be able to use must-gather. However, if I'm understanding it correctly, must-gather can only be used after bootstrap has succeeded. Thoughts?
_______________________________________________ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users