Hello Community, We recently upgraded our ACS environment from version *4.19.1* to *4.19.3*. Post-upgrade, we are experiencing recurring issues where the *cloudstack-management* services on our management nodes are failing intermittently until the services are manually restarted.
Upon reviewing the logs from the affected management servers, we observed *java.lang.OutOfMemoryError: Java heap space* Exceptions are being repeatedly logged. Below are the logs. > java[1039537]: INFO [c.c.k.c.KubernetesClusterManagerImpl] > (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71) Running > Kubernetes cluster state scanner on Kubernetes cluster : Test-Cluster for > state: Alert java[1039537]: java.lang.OutOfMemoryError: Java heap space > > java[1039537]: Dumping heap to >> /var/log/cloudstack/management/java_pid1039537.hprof ... > > java[1039537]: Heap dump file created [1850233163 bytes in 21.923 secs] > > java[1039537]: WARN [c.c.u.d.Merovingian2] >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Timed out on acquiring >> lock networks4235 . Waited for 600seconds > > java[1039537]: com.cloud.utils.exception.CloudRuntimeException: Timed out >> on acquiring lock networks4235 . Waited for 600seconds > > java[1039537]: at >> com.cloud.utils.db.Merovingian2.acquire(Merovingian2.java:151) > > java[1039537]: at >> com.cloud.utils.db.TransactionLegacy.lock(TransactionLegacy.java:386) > > java[1039537]: at >> com.cloud.utils.db.GenericDaoBase.acquireInLockTable(GenericDaoBase.java:1074) > > java[1039537]: at >> jdk.internal.reflect.GeneratedMethodAccessor474.invoke(Unknown Source) > > java[1039537]: at >> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > java[1039537]: at >> java.base/java.lang.reflect.Method.invoke(Method.java:566) > > java[1039537]: at >> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) > > java[1039537]: at >> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) > > java[1039537]: at >> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) > > java[1039537]: at >> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34) > > java[1039537]: at >> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) > > java[1039537]: at >> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) > > java[1039537]: at >> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) > > java[1039537]: at >> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) > > java[1039537]: at >> com.sun.proxy.$Proxy52.acquireInLockTable(Unknown Source) > > java[1039537]: at >> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.shutdownNetwork(NetworkOrchestrator.java:3080) > > java[1039537]: at >> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.reallyRun(NetworkOrchestrator.java:3530) > > java[1039537]: at >> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.runInContext(NetworkOrchestrator.java:3466) > > java[1039537]: at >> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) > > java[1039537]: at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) > > java[1039537]: at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) > > java[1039537]: at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) > > java[1039537]: at >> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) > > java[1039537]: at >> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > > java[1039537]: at >> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > > java[1039537]: at >> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > > java[1039537]: at >> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > > java[1039537]: at >> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > > java[1039537]: at java.base/java.lang.Thread.run(Thread.java:829) > > java[1039537]: WARN [o.a.c.e.o.NetworkOrchestrator] >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Network with id: 4235 >> doesn't exist, or unable to acquire lock for it as a part of network >> shutdown > > java[1039537]: ERROR [c.c.c.ClusterManagerImpl] >> (Cluster-Heartbeat-1:ctx-cdbfb71b) (logid:cd4a1b55) We have detected that >> at least one management server peer reports that this management server is >> down, perform active fencing>Jul 02 05:43:10 sc-mgmt-03.speedcloud.co.in >> java[1039537]: INFO [c.c.a.m.AgentManagerImpl] >> (AgentMonitor-1:ctx-0fb8adb7) (logid:581cfc3f) Found the following agents >> behind on ping: [58, 67, 116, 129, 147] > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.36 > > java[1039537]: INFO [o.a.c.v.s.VMSchedulerImpl] >> (VMSchedulerPollTask:ctx-8b5f50ea) (logid:69b8fd3c) Cleaned up 0 VM >> scheduled job entries > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.37 > > java[1039537]: INFO [c.c.k.c.KubernetesClusterManagerImpl] >> (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71) Running >> Kubernetes cluster state scanner on Kubernetes cluster : Stack-k8s-Test for >> state: Alert > > java[1039537]: WARN [c.c.n.r.VirtualNetworkApplianceManagerImpl] >> (RouterStatusMonitor-1:ctx-cbfa09ce) (logid:56ce8a0c) Unable to fetch basic >> router test results data from router r-15355-VM > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.36 > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.37 > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.36 > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.37 > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.36 > > java[1039537]: WARN [c.c.a.d.ParamGenericValidationWorker] >> (qtp940584193-1652:ctx-d5a00ae4 ctx-0e610389 ctx-a37dcc30) (logid:a55ae204) >> Received unknown parameters for command listHosts. Unknown parameters : >> filter > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.37 > > java[1039537]: INFO [c.c.s.d.VolumeStatsDaoImpl] >> (StatsCollector-2:ctx-b81c41e3) (logid:a37e0ad7) Removed a total of [0] >> volume_stats rows older than [Tue Jul 01 17:43:10 UTC 2025]. > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.36 > > java[1039537]: ERROR [c.c.s.StatsCollector] >> (StatsCollector-2:ctx-49034cc7) (logid:9cc0b9c6) db statistics collection >> failed due to / by zero > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.37 > > java[1039537]: ERROR [c.c.s.StatsCollector] >> (StatsCollector-2:ctx-d3b2a30a) (logid:657e156f) db statistics collection >> failed due to / by zero > > java[1039537]: ERROR [c.c.c.ClusterManagerImpl] >> (Cluster-Heartbeat-1:ctx-c8128059) (logid:4b587df9) We have detected that >> at least one management server peer reports that this management server is >> down, perform active fencing>Jul 02 05:43:11 sc-mgmt-03.speedcloud.co.in >> java[1039537]: INFO [c.c.a.m.AgentManagerImpl] >> (AgentTaskPool-1:ctx-04653cf4) (logid:514252d2) Investigating why host 58 >> has disconnected with event PingTimeout > > java[1039537]: ERROR [c.c.c.ClusterFenceManagerImpl] >> (Cluster-Notification-1:ctx-086ed801) (logid:8cd30369) Received node >> isolation notification, will perform self-fencing and shut myself down > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] >> (Thread-26:null) (logid:) Failure during validating cluster request from >> 10.232.8.36 > > Additionally, we checked the */etc/default/cloudstack-management* configuration file and found the following Java options configured under JAVA_OPTS*.* *JAVA_OPTS="-Djava.security.properties=/etc/cloudstack/management/java.security.ciphers > -Djava.awt.headless=true -Dcom.sun.management.jmxremote=false -Xmx2G > -XX:+UseParallelGC -XX:MaxGCPauseMillis=500 -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/var/log/cloudstack/management/ > -XX:ErrorFile=/var/log/cloudstack/management/cloudstack-management.err > -Djava.util.Arrays.useLegacyMergeSort=true "* And also after removing the *-Djava.util.Arrays.useLegacyMergeSort=true* option from *JAVA_OPTS. *We are unable to fetch the Public IPs from the dashboard and CMK as well. We got an error stating* Request failed (431): comparison method violates its general contract*.