Wei, It did work before so a routing change at our core must have messed it up. I assume the routing issue was the actual issue here. Everything else was just ancillary.
Thanks for all the help, the clusters are working now! -Wally On Thu, Feb 15, 2024 at 10:07 AM Wei ZHOU <[email protected]> wrote: > Hi, > > As I understand, > 1. After upgrading, you need to patch the system vms or recreate them. > Not a bug I think. > 2. a minor issue which does not impact the provisioning and operation of > CKS cluster. > 3. Looks like a network misconfiguration, but did it work before ? > > > -Wei > > > On Thu, 15 Feb 2024 at 16:39, Wally B <[email protected]> wrote: > > > As a quick add-on. After running those commands and getting the kubectl > > commands working the Error in the management log is > > > > tail -f /var/log/cloudstack/management/management-server.log | grep ERROR > > > > 2024-02-15 14:09:41,124 ERROR [c.c.k.c.a.KubernetesClusterActionWorker] > > (API-Job-Executor-4:ctx-29ed2b8e job-12348 ctx-3355553d) (logid:ae448a2e) > > Failed to setup Kubernetes cluster : pz-dev-k8s-ncus-00001 in usable > state > > as unable to access control node VMs of the cluster > > > > 2024-02-15 14:09:41,129 ERROR [c.c.a.ApiAsyncJobDispatcher] > > (API-Job-Executor-4:ctx-29ed2b8e job-12348) (logid:ae448a2e) Unexpected > > exception while executing > > > > > org.apache.cloudstack.api.command.user.kubernetes.cluster.CreateKubernetesClusterCmd > > > > 2024-02-15 14:33:01,117 ERROR [c.c.k.c.a.KubernetesClusterActionWorker] > > (API-Job-Executor-17:ctx-0685d548 job-12552 ctx-997de847) > (logid:fda8fc82) > > Failed to setup Kubernetes cluster : pz-dev-k8s-ncus-00001 in usable > state > > as unable to access control node VMs of the cluster > > > > > > did a quick test-netconnection from my pc to the control node and got > > > > > > > > Test-NetConnection 99.xx.xx.xxx -p 6443 > > > > > > > > ComputerName : 99.xx.xx.xxx > > RemoteAddress : 99.xx.xx.xxx > > RemotePort : 6443 > > InterfaceAlias : Ethernet > > SourceAddress : xxx.xxx.xxx.xxx > > TcpTestSucceeded : True > > > > > > So I did a test to see If I could get it from my Management hosts (on the > > same public ip range as the Virtual Router Public IP). and I got a TTL > > Expired. > > > > > > > > > > To wrap it up there were 3 issues. > > > > > > 1. Needed to delete and re-provision the Secondary Storage System Virtual > > Machine after upgrading from 4.18.1 to 4.19.0 > > 2. Needed to fix additional control nodes not getting the kubeadm.conf > > copied correctly (Wei PR) > > 3. Needed to fix some routing on our end since were were bouncing between > > our L3 TOR ->Firewall <- ISP Routers > > > > Thanks again for all the help, everyone! > > Wally > > > > On Thu, Feb 15, 2024 at 7:24 AM Wally B <[email protected]> wrote: > > > > > Thanks Wei ZHOU! > > > > > > That fixed the kubectl command issue but the cluster still just sits at > > > > > > Create Kubernetes cluster k8s-cluster-1 in progress > > > > > > Maybe this is just a UI issue? Unfortunately If I stop the k8s cluster > > > after it errors out it just stays in the error state. > > > > > > 1. Click Stop Kubernetes cluster > > > 2. UI Says it successfully stopped. > > > 3. Try to Start the Cluster but the power button just says Stop > > > Kubernetes cluster and the UI Status stays in the error state. > > > > > > > > > On Thu, Feb 15, 2024 at 7:02 AM Wei ZHOU <[email protected]> > wrote: > > > > > >> Hi, > > >> > > >> Please run the following commands as root: > > >> > > >> mkdir -p /root/.kube > > >> cp -i /etc/kubernetes/admin.conf /root/.kube/config > > >> > > >> After then the kubectl commands should work > > >> > > >> -Wei > > >> > > >> On Thu, 15 Feb 2024 at 13:53, Wally B <[email protected]> wrote: > > >> > > >> > What command do you suggest I run? > > >> > > > >> > kubeconfig returns command not found > > >> > > > >> > on your PR I see > > >> > > > >> > kubeadm join is being called out as well but I wanted to verify what > > you > > >> > wanted me to test first. > > >> > > > >> > On Thu, Feb 15, 2024 at 2:41 AM Wei ZHOU <[email protected]> > > wrote: > > >> > > > >> > > Hi Wally, > > >> > > > > >> > > I think the cluster is working fine. > > >> > > The kubeconfig is missing in extra nodes. I have just created a PR > > for > > >> > it: > > >> > > https://github.com/apache/cloudstack/pull/8658 > > >> > > You can run the command on the control nodes which should fix the > > >> > problem. > > >> > > > > >> > > > > >> > > -Wei > > >> > > > > >> > > On Thu, 15 Feb 2024 at 09:31, Wally B <[email protected]> > > wrote: > > >> > > > > >> > > > 3 Nodes > > >> > > > > > >> > > > Control 1 -- No Errors > > >> > > > > > >> > > > kubectl get nodes > > >> > > > NAME STATUS ROLES > > >> > AGE > > >> > > > VERSION > > >> > > > pz-dev-k8s-ncus-00001-control-18dabdb141b Ready > control-plane > > >> > 2m6s > > >> > > > v1.28.4 > > >> > > > pz-dev-k8s-ncus-00001-control-18dabdb6ad6 Ready > control-plane > > >> > 107s > > >> > > > v1.28.4 > > >> > > > pz-dev-k8s-ncus-00001-control-18dabdbc0a8 Ready > control-plane > > >> > 108s > > >> > > > v1.28.4 > > >> > > > pz-dev-k8s-ncus-00001-node-18dabdc1644 Ready <none> > > >> > 115s > > >> > > > v1.28.4 > > >> > > > pz-dev-k8s-ncus-00001-node-18dabdc6c16 Ready <none> > > >> > 115s > > >> > > > v1.28.4 > > >> > > > > > >> > > > > > >> > > > kubectl get pods --all-namespaces > > >> > > > NAMESPACE NAME > > >> > > > READY STATUS RESTARTS AGE > > >> > > > kube-system coredns-5dd5756b68-g84vk > > >> > > > 1/1 Running 0 2m46s > > >> > > > kube-system coredns-5dd5756b68-kf92x > > >> > > > 1/1 Running 0 2m46s > > >> > > > kube-system > > >> etcd-pz-dev-k8s-ncus-00001-control-18dabdb141b > > >> > > > 1/1 Running 0 2m50s > > >> > > > kube-system > > >> etcd-pz-dev-k8s-ncus-00001-control-18dabdb6ad6 > > >> > > > 1/1 Running 0 2m16s > > >> > > > kube-system > > >> etcd-pz-dev-k8s-ncus-00001-control-18dabdbc0a8 > > >> > > > 1/1 Running 0 2m37s > > >> > > > kube-system > > >> > > > kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdb141b > > >> > 1/1 > > >> > > > Running 0 2m52s > > >> > > > kube-system > > >> > > > kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdb6ad6 > > >> > 1/1 > > >> > > > Running 1 (2m16s ago) 2m15s > > >> > > > kube-system > > >> > > > kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdbc0a8 > > >> > 1/1 > > >> > > > Running 0 2m37s > > >> > > > kube-system > > >> > > > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdb141b > > >> > 1/1 > > >> > > > Running 1 (2m25s ago) 2m51s > > >> > > > kube-system > > >> > > > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdb6ad6 > > >> > 1/1 > > >> > > > Running 0 2m18s > > >> > > > kube-system > > >> > > > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdbc0a8 > > >> > 1/1 > > >> > > > Running 0 2m37s > > >> > > > kube-system kube-proxy-445qx > > >> > > > 1/1 Running 0 2m37s > > >> > > > kube-system kube-proxy-8swdg > > >> > > > 1/1 Running 0 2m2s > > >> > > > kube-system kube-proxy-bl9rx > > >> > > > 1/1 Running 0 2m47s > > >> > > > kube-system kube-proxy-pv8gj > > >> > > > 1/1 Running 0 2m43s > > >> > > > kube-system kube-proxy-v7cw2 > > >> > > > 1/1 Running 0 2m43s > > >> > > > kube-system > > >> > > > kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdb141b > > >> > 1/1 > > >> > > > Running 1 (2m22s ago) 2m50s > > >> > > > kube-system > > >> > > > kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdb6ad6 > > >> > 1/1 > > >> > > > Running 0 2m15s > > >> > > > kube-system > > >> > > > kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdbc0a8 > > >> > 1/1 > > >> > > > Running 0 2m37s > > >> > > > kube-system weave-net-8dvl5 > > >> > > > 2/2 Running 0 2m37s > > >> > > > kube-system weave-net-c54bz > > >> > > > 2/2 Running 0 2m43s > > >> > > > kube-system weave-net-lv8l4 > > >> > > > 2/2 Running 1 (2m42s ago) 2m47s > > >> > > > kube-system weave-net-vg6td > > >> > > > 2/2 Running 0 2m2s > > >> > > > kube-system weave-net-vq9s4 > > >> > > > 2/2 Running 0 2m43s > > >> > > > kubernetes-dashboard > dashboard-metrics-scraper-5657497c4c-4k886 > > >> > > > 1/1 Running 0 2m46s > > >> > > > kubernetes-dashboard kubernetes-dashboard-5b749d9495-jpbxl > > >> > > > 1/1 Running 1 (2m22s ago) 2m46s > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > Control 2: Errors at the CLI > > >> > > > Failed to start Execute cloud user/final scripts. > > >> > > > > > >> > > > kubectl get nodes > > >> > > > E0215 08:27:07.797825 2772 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:07.798759 2772 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:07.801039 2772 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:07.801977 2772 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:07.804029 2772 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > The connection to the server localhost:8080 was refused - did > you > > >> > specify > > >> > > > the right host or port? > > >> > > > > > >> > > > kubectl get pods --all-namespaces > > >> > > > E0215 08:29:41.818452 2811 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:41.819935 2811 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:41.820883 2811 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:41.822680 2811 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:41.823571 2811 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > The connection to the server localhost:8080 was refused - did > you > > >> > specify > > >> > > > the right host or port? > > >> > > > > > >> > > > Ping Google: Success > > >> > > > Ping Control Node 1: Success > > >> > > > > > >> > > > > > >> > > > Control 3: Errors at the CLI > > >> > > > Failed to start Execute cloud user/final scripts. > > >> > > > Failed to start deploy-kube-system.service. > > >> > > > > > >> > > > kubectl get nodes > > >> > > > E0215 08:27:15.057313 2697 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:15.058538 2697 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:15.059260 2697 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:15.061599 2697 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:27:15.062029 2697 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > The connection to the server localhost:8080 was refused - did > you > > >> > specify > > >> > > > the right host or port? > > >> > > > > > >> > > > > > >> > > > kubectl get pods --all-namespaces > > >> > > > E0215 08:29:57.108716 2736 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:57.109533 2736 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:57.111372 2736 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:57.112074 2736 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > E0215 08:29:57.113956 2736 memcache.go:265] couldn't get > > current > > >> > > server > > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s": > dial > > >> tcp > > >> > > > 127.0.0.1:8080: connect: connection refused > > >> > > > The connection to the server localhost:8080 was refused - did > you > > >> > specify > > >> > > > the right host or port? > > >> > > > > > >> > > > > > >> > > > Ping Google: Success > > >> > > > Ping Control Node 1: Success > > >> > > > > > >> > > > > > >> > > > On Thu, Feb 15, 2024 at 2:17 AM Wei ZHOU <[email protected] > > > > >> > wrote: > > >> > > > > > >> > > > > Can you try with 3 control nodes ? > > >> > > > > > > >> > > > > -Wei > > >> > > > > > > >> > > > > On Thu, 15 Feb 2024 at 09:13, Wally B <[email protected]> > > >> wrote: > > >> > > > > > > >> > > > > > - zone type : > > >> > > > > > Core > > >> > > > > > - network type: > > >> > > > > > Advanced > > >> > > > > > Isolated Network inside a Redundant VPC (same > results > > in > > >> > just > > >> > > > an > > >> > > > > > Isolated network without VPC) > > >> > > > > > - number of control nodes: > > >> > > > > > 2 Control Nodes (HA Cluster) > > >> > > > > > > > >> > > > > > We were able to deploy k8s in the past, not sure what > changed. > > >> > > > > > > > >> > > > > > Thanks! > > >> > > > > > -Wally > > >> > > > > > > > >> > > > > > On Thu, Feb 15, 2024 at 2:04 AM Wei ZHOU < > > [email protected] > > >> > > > >> > > > wrote: > > >> > > > > > > > >> > > > > > > Hi, > > >> > > > > > > > > >> > > > > > > can you share > > >> > > > > > > - zone type > > >> > > > > > > - network type > > >> > > > > > > - number of control nodes > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > -Wei > > >> > > > > > > > > >> > > > > > > On Thu, 15 Feb 2024 at 08:52, Wally B < > > [email protected]> > > >> > > wrote: > > >> > > > > > > > > >> > > > > > > > So > > >> > > > > > > > > > >> > > > > > > > Recreating the Sec Storage VM Fixed the Cert issue and I > > was > > >> > able > > >> > > > to > > >> > > > > > > > install K8s 1.28.4 Binaries. --- THANKS Wei ZHOU ! > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Im still getting > > >> > > > > > > > > > >> > > > > > > > [FAILED] Failed to start Execute cloud user/final > scripts. > > >> > > > > > > > > > >> > > > > > > > on 1 control and 1 worker. > > >> > > > > > > > > > >> > > > > > > > *Control 1 -- pz-dev-k8s-ncus-00001-control-18dabaf66c1 > > -- > > >> > > :* > > >> > > > No > > >> > > > > > > > errors at the CLI > > >> > > > > > > > > > >> > > > > > > > kubectl get nodes > > >> > > > > > > > NAME STATUS > ROLES > > >> > > > > > AGE > > >> > > > > > > > VERSION > > >> > > > > > > > pz-dev-k8s-ncus-00001-control-18dabaf0edb Ready > > >> > > control-plane > > >> > > > > > 5m2s > > >> > > > > > > > v1.28.4 > > >> > > > > > > > pz-dev-k8s-ncus-00001-control-18dabaf66c1 Ready > > >> > > control-plane > > >> > > > > > > 4m44s > > >> > > > > > > > v1.28.4 > > >> > > > > > > > pz-dev-k8s-ncus-00001-node-18dabafb0bd Ready > > <none> > > >> > > > > > > 4m47s > > >> > > > > > > > v1.28.4 > > >> > > > > > > > pz-dev-k8s-ncus-00001-node-18dabb006bc Ready > > <none> > > >> > > > > > > 4m47s > > >> > > > > > > > v1.28.4 > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > kubectl get pods --all-namespaces > > >> > > > > > > > NAMESPACE NAME > > >> > > > > > > > READY STATUS RESTARTS AGE > > >> > > > > > > > kube-system coredns-5dd5756b68-295gb > > >> > > > > > > > 1/1 Running 0 5m32s > > >> > > > > > > > kube-system coredns-5dd5756b68-cdwvw > > >> > > > > > > > 1/1 Running 0 5m33s > > >> > > > > > > > kube-system > > >> > > > etcd-pz-dev-k8s-ncus-00001-control-18dabaf0edb > > >> > > > > > > > 1/1 Running 0 5m36s > > >> > > > > > > > kube-system > > >> > > > etcd-pz-dev-k8s-ncus-00001-control-18dabaf66c1 > > >> > > > > > > > 1/1 Running 0 5m23s > > >> > > > > > > > kube-system > > >> > > > > > > > > kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf0edb > > >> > > > > > 1/1 > > >> > > > > > > > Running 0 5m36s > > >> > > > > > > > kube-system > > >> > > > > > > > > kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf66c1 > > >> > > > > > 1/1 > > >> > > > > > > > Running 0 5m23s > > >> > > > > > > > kube-system > > >> > > > > > > > > > >> > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf0edb > > >> > > > > > 1/1 > > >> > > > > > > > Running 1 (5m13s ago) 5m36s > > >> > > > > > > > kube-system > > >> > > > > > > > > > >> > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf66c1 > > >> > > > > > 1/1 > > >> > > > > > > > Running 0 5m23s > > >> > > > > > > > kube-system kube-proxy-2m8zb > > >> > > > > > > > 1/1 Running 0 5m26s > > >> > > > > > > > kube-system kube-proxy-cwpjg > > >> > > > > > > > 1/1 Running 0 5m33s > > >> > > > > > > > kube-system kube-proxy-l2vbf > > >> > > > > > > > 1/1 Running 0 5m26s > > >> > > > > > > > kube-system kube-proxy-qhlqt > > >> > > > > > > > 1/1 Running 0 5m23s > > >> > > > > > > > kube-system > > >> > > > > > > > > kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf0edb > > >> > > > > > 1/1 > > >> > > > > > > > Running 1 (5m8s ago) 5m36s > > >> > > > > > > > kube-system > > >> > > > > > > > > kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf66c1 > > >> > > > > > 1/1 > > >> > > > > > > > Running 0 5m23s > > >> > > > > > > > kube-system weave-net-5cs26 > > >> > > > > > > > 2/2 Running 1 (5m9s ago) 5m26s > > >> > > > > > > > kube-system weave-net-9zqrw > > >> > > > > > > > 2/2 Running 1 (5m28s ago) 5m33s > > >> > > > > > > > kube-system weave-net-fcwtr > > >> > > > > > > > 2/2 Running 0 5m23s > > >> > > > > > > > kube-system weave-net-lh2dh > > >> > > > > > > > 2/2 Running 1 (4m41s ago) 5m26s > > >> > > > > > > > kubernetes-dashboard > > >> > dashboard-metrics-scraper-5657497c4c-r284t > > >> > > > > > > > 1/1 Running 0 5m32s > > >> > > > > > > > kubernetes-dashboard > > kubernetes-dashboard-5b749d9495-vtwdd > > >> > > > > > > > 1/1 Running 0 5m32s > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > *Control 2 --- > pz-dev-k8s-ncus-00001-control-18dabaf66c1 > > >> :* > > >> > > > > > [FAILED] > > >> > > > > > > > Failed to start Execute cloud user/final scripts. > > >> > > > > > > > > > >> > > > > > > > kubectl get nodes > > >> > > > > > > > E0215 07:38:33.314561 2643 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:38:33.316751 2643 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:38:33.317754 2643 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:38:33.319181 2643 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:38:33.319975 2643 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > The connection to the server localhost:8080 was refused > - > > >> did > > >> > you > > >> > > > > > specify > > >> > > > > > > > the right host or port? > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > kubectl get pods --all-namespaces > > >> > > > > > > > E0215 07:42:23.786704 2700 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:42:23.787455 2700 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:42:23.789529 2700 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:42:23.790051 2700 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > E0215 07:42:23.791742 2700 memcache.go:265] couldn't > > get > > >> > > current > > >> > > > > > > server > > >> > > > > > > > API group list: Get " > > http://localhost:8080/api?timeout=32s > > >> ": > > >> > > dial > > >> > > > > tcp > > >> > > > > > > > 127.0.0.1:8080: connect: connection refused > > >> > > > > > > > The connection to the server localhost:8080 was refused > - > > >> did > > >> > you > > >> > > > > > specify > > >> > > > > > > > the right host or port? > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > */var/log/daemon.log* > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://docs.google.com/document/d/1KuIx0jI4TuAXPgACY3rJQz3L2B8AjeqOL0Fm5r4YF5M/edit?usp=sharing > > >> > > > > > > > > > >> > > > > > > > */var/log/messages* > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://docs.google.com/document/d/15xet6kxI9rdgi4RkIHqtn-Wywph4h1Coyt_cyrJYkv4/edit?usp=sharing > > >> > > > > > > > > > >> > > > > > > > On Thu, Feb 15, 2024 at 1:21 AM Wei ZHOU < > > >> > [email protected]> > > >> > > > > > wrote: > > >> > > > > > > > > > >> > > > > > > > > Destroy ssvm and retry when new ssvm is Up ? > > >> > > > > > > > > > > >> > > > > > > > > -Wei > > >> > > > > > > > > > > >> > > > > > > > > 在 2024年2月15日星期四,Wally B <[email protected]> 写道: > > >> > > > > > > > > > > >> > > > > > > > > > Super Weird. I have two other versions added > > >> successfully > > >> > but > > >> > > > now > > >> > > > > > > when > > >> > > > > > > > I > > >> > > > > > > > > > try to add an ISO/version I get the following on the > > >> > > management > > >> > > > > > host. > > >> > > > > > > > > This > > >> > > > > > > > > > is the first time I've tried adding a K8s version > > since > > >> > > 4.18.0 > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > tail -f > > >> > /var/log/cloudstack/management/management-server.log > > >> > > | > > >> > > > > grep > > >> > > > > > > > ERROR > > >> > > > > > > > > > > > >> > > > > > > > > > 2024-02-15 06:26:18,900 DEBUG [c.c.a.t.Request] > > >> > > > > > > > > > (AgentManager-Handler-5:null) (logid:) Seq > > >> > > > > 48-6373437897659383816: > > >> > > > > > > > > > Processing: { Ans: , MgmtId: 15643723020152, via: > 48, > > >> Ver: > > >> > > v1, > > >> > > > > > > Flags: > > >> > > > > > > > > 10, > > >> > > > > > > > > > [{"com.cloud.agent.api.storage.DownloadAnswer":{" > > >> > > > > > > > > > jobId":"39d72d08-ab48-47dd-b09a-eee3ed816f4d"," > > >> > > > > > > > > > downloadPct":"0","errorString":"PKIX > > >> > > > > > > > > > path building failed: > > >> > > > > > > > > > > > >> sun.security.provider.certpath.SunCertPathBuilderException: > > >> > > > > unable > > >> > > > > > to > > >> > > > > > > > > find > > >> > > > > > > > > > valid certification path to requested > > >> > > > > > > > > > > > target","downloadStatus":"DOWNLOAD_ERROR","downloadPath" > > >> > > > > > > > > > > > :"/mnt/SecStorage/73075a0a-38a1-3631-8170-8887c04f6073/ > > >> > > > > > > > > > template/tmpl/1/223/dnld9180711723601784047tmp_"," > > >> > > > > > > > > > > installPath":"template/tmpl/1/223","templateSize":"(0 > > >> > > > > > > > > > bytes) 0","templatePhySicalSize":"(0 bytes) > > >> > > > > > > > > > 0","checkSum":"4dfb9d8be2191bc8bc4b89d78795a5 > > >> > > > > > > > > > b","result":"true","details":"PKIX > > >> > > > > > > > > > path building failed: > > >> > > > > > > > > > > > >> sun.security.provider.certpath.SunCertPathBuilderException: > > >> > > > > unable > > >> > > > > > to > > >> > > > > > > > > find > > >> > > > > > > > > > valid certification path to requested > > >> > > > > > > > > > > target","wait":"0","bypassHostMaintenance":"false"}}] > > } > > >> > > > > > > > > > > > >> > > > > > > > > > 2024-02-15 06:26:18,937 ERROR > > >> > > > > [o.a.c.s.i.BaseImageStoreDriverImpl] > > >> > > > > > > > > > (RemoteHostEndPoint-5:ctx-55063062) (logid:e21177cb) > > >> Failed > > >> > > to > > >> > > > > > > register > > >> > > > > > > > > > template: b6e79c5a-38d4-4cf5-8606-e6f209b6b4c2 with > > >> error: > > >> > > PKIX > > >> > > > > > path > > >> > > > > > > > > > building failed: > > >> > > > > > > > > > > > >> sun.security.provider.certpath.SunCertPathBuilderException: > > >> > > > > unable > > >> > > > > > to > > >> > > > > > > > > find > > >> > > > > > > > > > valid certification path to requested target > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > On Wed, Feb 14, 2024 at 11:27 PM Wei ZHOU < > > >> > > > [email protected] > > >> > > > > > > > >> > > > > > > > wrote: > > >> > > > > > > > > > > > >> > > > > > > > > > > Can you try 1.27.8 or 1.28.4 on > > >> > > > > > > https://download.cloudstack.org/cks/ > > >> > > > > > > > ? > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > -Wei > > >> > > > > > > > > > > > > >> > > > > > > > > > > 在 2024年2月15日星期四,Wally B <[email protected]> > 写道: > > >> > > > > > > > > > > > > >> > > > > > > > > > > > Hello Everyone! > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > We are currently attempting to deploy k8s > clusters > > >> and > > >> > > are > > >> > > > > > > running > > >> > > > > > > > > into > > >> > > > > > > > > > > > issues with the deployment. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Current CS Environment: > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > CloudStack Verison: 4.19.0 (Same issue before we > > >> > upgraded > > >> > > > > from > > >> > > > > > > > > 4.18.1). > > >> > > > > > > > > > > > Hypervisor Type: Ubuntu 20.04.03 KVM > > >> > > > > > > > > > > > Attempted K8s Bins: 1.23.3, 1.27.3 > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > ======== ISSUE ========= > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > For some reason when we attempt the cluster > > >> > provisioning > > >> > > > all > > >> > > > > of > > >> > > > > > > the > > >> > > > > > > > > VMs > > >> > > > > > > > > > > > start up, SSH Keys are installed, but then at > > least > > >> 1, > > >> > > > > > sometimes > > >> > > > > > > 2 > > >> > > > > > > > of > > >> > > > > > > > > > the > > >> > > > > > > > > > > > VMs (control and/or worker) we get: > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > [FAILED] Failed to start > > deploy-kube-system.service. > > >> > > > > > > > > > > > [FAILED] Failed to start Execute cloud > user/final > > >> > > scripts. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > The Cloudstack UI just says: > > >> > > > > > > > > > > > Create Kubernetes cluster test-cluster in > progress > > >> > > > > > > > > > > > for about an hour (I assume this is the 3600 > > second > > >> > > > timeout) > > >> > > > > > and > > >> > > > > > > > then > > >> > > > > > > > > > > > fails. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > In the users event log it stays on: > > >> > > > > > > > > > > > INFO KUBERNETES.CLUSTER.CREATE > > >> > > > > > > > > > > > Scheduled > > >> > > > > > > > > > > > Creating Kubernetes cluster. Cluster Id: XXX > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > I can ssh into the VMs with their assigned > private > > >> > keys. > > >> > > I > > >> > > > > > > > attempted > > >> > > > > > > > > to > > >> > > > > > > > > > > run > > >> > > > > > > > > > > > the deploy-kube-system script but it just says > > >> already > > >> > > > > > > provisioned! > > >> > > > > > > > > I'm > > >> > > > > > > > > > > not > > >> > > > > > > > > > > > sure how I would Execute cloud user/final > scripts. > > >> If I > > >> > > > > attempt > > >> > > > > > > to > > >> > > > > > > > > stop > > >> > > > > > > > > > > the > > >> > > > > > > > > > > > cluster and start it again nothing seems to > > change. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Any help would be appreciated, I can provide any > > >> > details > > >> > > as > > >> > > > > > they > > >> > > > > > > > are > > >> > > > > > > > > > > > needed! > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Thanks! > > >> > > > > > > > > > > > Wally > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >
