GitHub user baltazorbest created a discussion: CKS Firewall and scaling cluster 
problem if default firewall rules delete

### problem

After creating k8s cluster and remove default firewall rules, I cannot scaling 
cluster, with network error:

> 2025-10-02 13:02:14,919 WARN  [o.a.c.m.w.WebhookServiceImpl] 
> (API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be, ctx-35bd1fab, 
> ctx-42198853]) (logid:2fbf4611) Skipping delivering event Event 
> {"description":"{\"event\":\"VM.START\",\"status\":\"Completed\"}","eventId":null,"eventType":"VM.START","eventUuid
":null,"resourceType":"VirtualMachine","resourceUUID":null} to any webhook as 
account ID is missing
2025-10-02 13:02:14,919 WARN  [o.a.c.f.e.EventDistributorImpl] 
(API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be, ctx-35bd1fab, 
ctx-42198853]) (logid:2fbf4611) Failed to publish event [category: ActionEvent, 
type: VM.START] on bus webhookEventBus
2025-10-02 13:02:14,936 ERROR [c.c.k.c.a.KubernetesClusterScaleWorker] 
(API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be]) (logid:2fbf4611) 
Scaling failed for Kubernetes cluster : my-k8s, unable to update network rules 
com.cloud.exception.ManagementServerException: Firewall rule for node SSH 
access can't
 be provisioned
        at 
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterIsolatedNetworkRules(KubernetesClusterScaleWorker.java:128)
        at 
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterNetworkRules(KubernetesClusterScaleWorker.java:176)
        at 
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleUpKubernetesClusterSize(KubernetesClusterScaleWorker.java:388)
        at 
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterSize(KubernetesClusterScaleWorker.java:424)
        at 
com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleCluster(KubernetesClusterScaleWorker.java:477)
        at 
com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.scaleKubernetesCluster(KubernetesClusterManagerImpl.java:1767)
        at jdk.internal.reflect.GeneratedMethodAccessor1219.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
        at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
        at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at 
org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:105)
        at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
        at 
com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:52)
        at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
        at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
        at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
        at jdk.proxy3/jdk.proxy3.$Proxy517.scaleKubernetesCluster(Unknown 
Source)
        at 
org.apache.cloudstack.api.command.user.kubernetes.cluster.ScaleKubernetesClusterCmd.execute(ScaleKubernetesClusterCmd.java:160)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:173)
        at 
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:110)
        at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:652)
        at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:600)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

### versions

OS is ubuntu 22.04
Cloudstack version is 4.20.1
K8s version is v1.33.1-calico-x86_64
Primary storage is Ceph RBD 19.2.3
Libvirt version is 8.0.0-1ubuntu7.12



### The steps to reproduce the bug

1. Create a network with any subnet (e.g., 10.10.10.1/24).
2. Create a k8s cluster in HA mode with one worker node, using the previously 
created external network.
3. Remove the default firewall rules:

- 0.0.0.0/0 TCP 6443 6443
- 0.0.0.0/0 TCP 2222 2225

4. Add new firewall rules:

- 10.10.10.1/24 TCP 1 65534
- 1.1.1.1/32 TCP 1 65534

5. Try to scale the cluster to two worker nodes.

Result:
An error occurs, although the new instance is created.

Workaround
When using the following firewall rules instead:

- 10.10.10.1/24 TCP 6443 6443
- 10.10.10.1/24 TCP 2222 2225

→ Scaling the cluster works correctly.

Additional Issues Observed
1. Opening SSH (2222–2225) and k8s management (6443) to 0.0.0.0/0 is a security 
risk.
2. When k8s enters the alert state, it is impossible to repair the cluster. The 
only options available are stop or delete.
- After stopping and starting the cluster, it change state to running. The new 
worker instance is created, but it does not join the Kubernetes cluster (it 
isn’t present in the cluster).
- However, scaling the cluster is still not possible, and deleting an 
individual instance also fails.
- The only option left is to remove the entire cluster and create it again.



### What to do about it?

_No response_

GitHub link: https://github.com/apache/cloudstack/discussions/11783

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to