Hi Raghava Can you please increase the memory in JAVA_OPTS
-Xmx2G to Xmx6G or so after checking with the RAM available on the system. Thanks & Regards Nischal On Thu, Jul 3, 2025 at 9:58 PM Raghava Yerubandi <raghavayeruba...@gmail.com> wrote: > Hello Community, > > We recently upgraded our ACS environment from version *4.19.1* to *4.19.3*. > Post-upgrade, we are experiencing recurring issues where the > *cloudstack-management* services on our management nodes are failing > intermittently until the services are manually restarted. > > Upon reviewing the logs from the affected management servers, we > observed *java.lang.OutOfMemoryError: > Java heap space* Exceptions are being repeatedly logged. > > Below are the logs. > > > java[1039537]: INFO [c.c.k.c.KubernetesClusterManagerImpl] > > (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71) > Running > > Kubernetes cluster state scanner on Kubernetes cluster : Test-Cluster for > > state: Alert > > java[1039537]: java.lang.OutOfMemoryError: Java heap space > > > > java[1039537]: Dumping heap to > >> /var/log/cloudstack/management/java_pid1039537.hprof ... > > > > java[1039537]: Heap dump file created [1850233163 bytes in 21.923 secs] > > > > java[1039537]: WARN [c.c.u.d.Merovingian2] > >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Timed out on > acquiring > >> lock networks4235 . Waited for 600seconds > > > > java[1039537]: com.cloud.utils.exception.CloudRuntimeException: Timed out > >> on acquiring lock networks4235 . Waited for 600seconds > > > > java[1039537]: at > >> com.cloud.utils.db.Merovingian2.acquire(Merovingian2.java:151) > > > > java[1039537]: at > >> com.cloud.utils.db.TransactionLegacy.lock(TransactionLegacy.java:386) > > > > java[1039537]: at > >> > com.cloud.utils.db.GenericDaoBase.acquireInLockTable(GenericDaoBase.java:1074) > > > > java[1039537]: at > >> jdk.internal.reflect.GeneratedMethodAccessor474.invoke(Unknown Source) > > > > java[1039537]: at > >> > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > java[1039537]: at > >> java.base/java.lang.reflect.Method.invoke(Method.java:566) > > > > java[1039537]: at > >> > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) > > > > java[1039537]: at > >> > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) > > > > java[1039537]: at > >> > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) > > > > java[1039537]: at > >> > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34) > > > > java[1039537]: at > >> > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) > > > > java[1039537]: at > >> > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) > > > > java[1039537]: at > >> > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) > > > > java[1039537]: at > >> > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) > > > > java[1039537]: at > >> com.sun.proxy.$Proxy52.acquireInLockTable(Unknown Source) > > > > java[1039537]: at > >> > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.shutdownNetwork(NetworkOrchestrator.java:3080) > > > > java[1039537]: at > >> > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.reallyRun(NetworkOrchestrator.java:3530) > > > > java[1039537]: at > >> > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.runInContext(NetworkOrchestrator.java:3466) > > > > java[1039537]: at > >> > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) > > > > java[1039537]: at > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) > > > > java[1039537]: at > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) > > > > java[1039537]: at > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) > > > > java[1039537]: at > >> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) > > > > java[1039537]: at > >> > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > > > > java[1039537]: at > >> > java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > > > > java[1039537]: at > >> > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > > > > java[1039537]: at > >> > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > > > > java[1039537]: at > >> > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > > > > java[1039537]: at java.base/java.lang.Thread.run(Thread.java:829) > > > > java[1039537]: WARN [o.a.c.e.o.NetworkOrchestrator] > >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Network with id: > 4235 > >> doesn't exist, or unable to acquire lock for it as a part of network > >> shutdown > > > > java[1039537]: ERROR [c.c.c.ClusterManagerImpl] > >> (Cluster-Heartbeat-1:ctx-cdbfb71b) (logid:cd4a1b55) We have detected > that > >> at least one management server peer reports that this management server > is > >> down, perform active fencing>Jul 02 05:43:10 > sc-mgmt-03.speedcloud.co.in > >> java[1039537]: INFO [c.c.a.m.AgentManagerImpl] > >> (AgentMonitor-1:ctx-0fb8adb7) (logid:581cfc3f) Found the following > agents > >> behind on ping: [58, 67, 116, 129, 147] > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.36 > > > > java[1039537]: INFO [o.a.c.v.s.VMSchedulerImpl] > >> (VMSchedulerPollTask:ctx-8b5f50ea) (logid:69b8fd3c) Cleaned up 0 VM > >> scheduled job entries > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.37 > > > > java[1039537]: INFO [c.c.k.c.KubernetesClusterManagerImpl] > >> (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71) > Running > >> Kubernetes cluster state scanner on Kubernetes cluster : Stack-k8s-Test > for > >> state: Alert > > > > java[1039537]: WARN [c.c.n.r.VirtualNetworkApplianceManagerImpl] > >> (RouterStatusMonitor-1:ctx-cbfa09ce) (logid:56ce8a0c) Unable to fetch > basic > >> router test results data from router r-15355-VM > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.36 > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.37 > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.36 > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.37 > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.36 > > > > java[1039537]: WARN [c.c.a.d.ParamGenericValidationWorker] > >> (qtp940584193-1652:ctx-d5a00ae4 ctx-0e610389 ctx-a37dcc30) > (logid:a55ae204) > >> Received unknown parameters for command listHosts. Unknown parameters : > >> filter > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.37 > > > > java[1039537]: INFO [c.c.s.d.VolumeStatsDaoImpl] > >> (StatsCollector-2:ctx-b81c41e3) (logid:a37e0ad7) Removed a total of [0] > >> volume_stats rows older than [Tue Jul 01 17:43:10 UTC 2025]. > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.36 > > > > java[1039537]: ERROR [c.c.s.StatsCollector] > >> (StatsCollector-2:ctx-49034cc7) (logid:9cc0b9c6) db statistics > collection > >> failed due to / by zero > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.37 > > > > java[1039537]: ERROR [c.c.s.StatsCollector] > >> (StatsCollector-2:ctx-d3b2a30a) (logid:657e156f) db statistics > collection > >> failed due to / by zero > > > > java[1039537]: ERROR [c.c.c.ClusterManagerImpl] > >> (Cluster-Heartbeat-1:ctx-c8128059) (logid:4b587df9) We have detected > that > >> at least one management server peer reports that this management server > is > >> down, perform active fencing>Jul 02 05:43:11 > sc-mgmt-03.speedcloud.co.in > >> java[1039537]: INFO [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-04653cf4) (logid:514252d2) Investigating why host > 58 > >> has disconnected with event PingTimeout > > > > java[1039537]: ERROR [c.c.c.ClusterFenceManagerImpl] > >> (Cluster-Notification-1:ctx-086ed801) (logid:8cd30369) Received node > >> isolation notification, will perform self-fencing and shut myself down > > > > java[1039537]: WARN [c.c.c.ClusterServiceServletContainer] > >> (Thread-26:null) (logid:) Failure during validating cluster request from > >> 10.232.8.36 > > > > > > Additionally, we checked the */etc/default/cloudstack-management* > configuration file and found the following Java options configured under > JAVA_OPTS*.* > > > *JAVA_OPTS="-Djava.security.properties=/etc/cloudstack/management/java.security.ciphers > > -Djava.awt.headless=true -Dcom.sun.management.jmxremote=false -Xmx2G > > -XX:+UseParallelGC -XX:MaxGCPauseMillis=500 > -XX:+HeapDumpOnOutOfMemoryError > > -XX:HeapDumpPath=/var/log/cloudstack/management/ > > -XX:ErrorFile=/var/log/cloudstack/management/cloudstack-management.err > > -Djava.util.Arrays.useLegacyMergeSort=true "* > > > And also after removing the *-Djava.util.Arrays.useLegacyMergeSort=true* > option from *JAVA_OPTS. *We are unable to fetch the Public IPs from the > dashboard and CMK as well. We got an error stating* Request failed (431): > comparison method violates its general contract*. >