GitHub user baltazorbest closed a discussion: CKS Firewall and scaling cluster
problem if default firewall rules delete
### problem
After creating k8s cluster and remove default firewall rules, I cannot scaling
cluster, with network error:
> 2025-10-02 13:02:14,919 WARN [o.a.c.m.w.WebhookServiceImpl]
> (API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be, ctx-35bd1fab,
> ctx-42198853]) (logid:2fbf4611) Skipping delivering event Event
> {"description":"{\"event\":\"VM.START\",\"status\":\"Completed\"}","eventId":null,"eventType":"VM.START","eventUuid
":null,"resourceType":"VirtualMachine","resourceUUID":null} to any webhook as
account ID is missing
2025-10-02 13:02:14,919 WARN [o.a.c.f.e.EventDistributorImpl]
(API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be, ctx-35bd1fab,
ctx-42198853]) (logid:2fbf4611) Failed to publish event [category: ActionEvent,
type: VM.START] on bus webhookEventBus
2025-10-02 13:02:14,936 ERROR [c.c.k.c.a.KubernetesClusterScaleWorker]
(API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be]) (logid:2fbf4611)
Scaling failed for Kubernetes cluster : my-k8s, unable to update network rules
com.cloud.exception.ManagementServerException: Firewall rule for node SSH
access can't
be provisioned
at
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterIsolatedNetworkRules(KubernetesClusterScaleWorker.java:128)
at
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterNetworkRules(KubernetesClusterScaleWorker.java:176)
at
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleUpKubernetesClusterSize(KubernetesClusterScaleWorker.java:388)
at
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterSize(KubernetesClusterScaleWorker.java:424)
at
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleCluster(KubernetesClusterScaleWorker.java:477)
at
com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.scaleKubernetesCluster(KubernetesClusterManagerImpl.java:1767)
at jdk.internal.reflect.GeneratedMethodAccessor1219.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at
org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:105)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
at
com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:52)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
at jdk.proxy3/jdk.proxy3.$Proxy517.scaleKubernetesCluster(Unknown
Source)
at
org.apache.cloudstack.api.command.user.kubernetes.cluster.ScaleKubernetesClusterCmd.execute(ScaleKubernetesClusterCmd.java:160)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:173)
at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:110)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:652)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:600)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
### versions
OS is ubuntu 22.04
Cloudstack version is 4.20.1
K8s version is v1.33.1-calico-x86_64
Primary storage is Ceph RBD 19.2.3
Libvirt version is 8.0.0-1ubuntu7.12
### The steps to reproduce the bug
1. Create a network with any subnet (e.g., 10.10.10.1/24).
2. Create a k8s cluster in HA mode with one worker node, using the previously
created external network.
3. Remove the default firewall rules:
- 0.0.0.0/0 TCP 6443 6443
- 0.0.0.0/0 TCP 2222 2225
4. Add new firewall rules:
- 10.10.10.1/24 TCP 1 65534
- 1.1.1.1/32 TCP 1 65534
5. Try to scale the cluster to two worker nodes.
Result:
An error occurs, although the new instance is created.
Workaround
When using the following firewall rules instead:
- 10.10.10.1/24 TCP 6443 6443
- 10.10.10.1/24 TCP 2222 2225
→ Scaling the cluster works correctly.
Additional Issues Observed
1. Opening SSH (2222–2225) and k8s management (6443) to 0.0.0.0/0 is a security
risk.
2. When k8s enters the alert state, it is impossible to repair the cluster. The
only options available are stop or delete.
- After stopping and starting the cluster, it change state to running. The new
worker instance is created, but it does not join the Kubernetes cluster (it
isn’t present in the cluster).
- However, scaling the cluster is still not possible, and deleting an
individual instance also fails.
- The only option left is to remove the entire cluster and create it again.
### What to do about it?
_No response_
GitHub link: https://github.com/apache/cloudstack/discussions/11783
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]