Hi Raghava

Can you please increase the memory in JAVA_OPTS

 -Xmx2G   to Xmx6G or so after checking with the  RAM available on the
system.




Thanks & Regards
Nischal


On Thu, Jul 3, 2025 at 9:58 PM Raghava Yerubandi <raghavayeruba...@gmail.com>
wrote:

> Hello Community,
>
> We recently upgraded our ACS environment from version *4.19.1* to *4.19.3*.
> Post-upgrade, we are experiencing recurring issues where the
> *cloudstack-management* services on our management nodes are failing
> intermittently until the services are manually restarted.
>
> Upon reviewing the logs from the affected management servers, we
> observed *java.lang.OutOfMemoryError:
> Java heap space* Exceptions are being repeatedly logged.
>
> Below are the logs.
>
> > java[1039537]: INFO  [c.c.k.c.KubernetesClusterManagerImpl]
> > (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71)
> Running
> > Kubernetes cluster state scanner on Kubernetes cluster : Test-Cluster for
> > state: Alert
>
> java[1039537]: java.lang.OutOfMemoryError: Java heap space
> >
> > java[1039537]: Dumping heap to
> >> /var/log/cloudstack/management/java_pid1039537.hprof ...
> >
> > java[1039537]: Heap dump file created [1850233163 bytes in 21.923 secs]
> >
> > java[1039537]: WARN  [c.c.u.d.Merovingian2]
> >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Timed out on
> acquiring
> >> lock networks4235 .  Waited for 600seconds
> >
> > java[1039537]: com.cloud.utils.exception.CloudRuntimeException: Timed out
> >> on acquiring lock networks4235 .  Waited for 600seconds
> >
> > java[1039537]:         at
> >> com.cloud.utils.db.Merovingian2.acquire(Merovingian2.java:151)
> >
> > java[1039537]:         at
> >> com.cloud.utils.db.TransactionLegacy.lock(TransactionLegacy.java:386)
> >
> > java[1039537]:         at
> >>
> com.cloud.utils.db.GenericDaoBase.acquireInLockTable(GenericDaoBase.java:1074)
> >
> > java[1039537]:         at
> >> jdk.internal.reflect.GeneratedMethodAccessor474.invoke(Unknown Source)
> >
> > java[1039537]:         at
> >>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> > java[1039537]:         at
> >> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
> >
> > java[1039537]:         at
> >>
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
> >
> > java[1039537]:         at
> >>
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
> >
> > java[1039537]:         at
> >> com.sun.proxy.$Proxy52.acquireInLockTable(Unknown Source)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.shutdownNetwork(NetworkOrchestrator.java:3080)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.reallyRun(NetworkOrchestrator.java:3530)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.runInContext(NetworkOrchestrator.java:3466)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> >
> > java[1039537]:         at
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> >
> > java[1039537]:         at
> >>
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> >
> > java[1039537]:         at
> >>
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> >
> > java[1039537]:         at
> >>
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> >
> > java[1039537]:         at
> >>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> >
> > java[1039537]:         at
> >>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> >
> > java[1039537]:         at java.base/java.lang.Thread.run(Thread.java:829)
> >
> > java[1039537]: WARN  [o.a.c.e.o.NetworkOrchestrator]
> >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Network with id:
> 4235
> >> doesn't exist, or unable to acquire lock for it as a part of network
> >> shutdown
> >
> > java[1039537]: ERROR [c.c.c.ClusterManagerImpl]
> >> (Cluster-Heartbeat-1:ctx-cdbfb71b) (logid:cd4a1b55) We have detected
> that
> >> at least one management server peer reports that this management server
> is
> >> down, perform active fencing>Jul 02 05:43:10
> sc-mgmt-03.speedcloud.co.in
> >> java[1039537]: INFO  [c.c.a.m.AgentManagerImpl]
> >> (AgentMonitor-1:ctx-0fb8adb7) (logid:581cfc3f) Found the following
> agents
> >> behind on ping: [58, 67, 116, 129, 147]
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.36
> >
> > java[1039537]: INFO  [o.a.c.v.s.VMSchedulerImpl]
> >> (VMSchedulerPollTask:ctx-8b5f50ea) (logid:69b8fd3c) Cleaned up 0 VM
> >> scheduled job entries
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.37
> >
> > java[1039537]: INFO  [c.c.k.c.KubernetesClusterManagerImpl]
> >> (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71)
> Running
> >> Kubernetes cluster state scanner on Kubernetes cluster : Stack-k8s-Test
> for
> >> state: Alert
> >
> > java[1039537]: WARN  [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> >> (RouterStatusMonitor-1:ctx-cbfa09ce) (logid:56ce8a0c) Unable to fetch
> basic
> >> router test results data from router r-15355-VM
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.36
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.37
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.36
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.37
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.36
> >
> > java[1039537]: WARN  [c.c.a.d.ParamGenericValidationWorker]
> >> (qtp940584193-1652:ctx-d5a00ae4 ctx-0e610389 ctx-a37dcc30)
> (logid:a55ae204)
> >> Received unknown parameters for command listHosts. Unknown parameters :
> >> filter
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.37
> >
> > java[1039537]: INFO  [c.c.s.d.VolumeStatsDaoImpl]
> >> (StatsCollector-2:ctx-b81c41e3) (logid:a37e0ad7) Removed a total of [0]
> >> volume_stats rows older than [Tue Jul 01 17:43:10 UTC 2025].
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.36
> >
> > java[1039537]: ERROR [c.c.s.StatsCollector]
> >> (StatsCollector-2:ctx-49034cc7) (logid:9cc0b9c6) db statistics
> collection
> >> failed due to / by zero
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.37
> >
> > java[1039537]: ERROR [c.c.s.StatsCollector]
> >> (StatsCollector-2:ctx-d3b2a30a) (logid:657e156f) db statistics
> collection
> >> failed due to / by zero
> >
> > java[1039537]: ERROR [c.c.c.ClusterManagerImpl]
> >> (Cluster-Heartbeat-1:ctx-c8128059) (logid:4b587df9) We have detected
> that
> >> at least one management server peer reports that this management server
> is
> >> down, perform active fencing>Jul 02 05:43:11
> sc-mgmt-03.speedcloud.co.in
> >> java[1039537]: INFO  [c.c.a.m.AgentManagerImpl]
> >> (AgentTaskPool-1:ctx-04653cf4) (logid:514252d2) Investigating why host
> 58
> >> has disconnected with event PingTimeout
> >
> > java[1039537]: ERROR [c.c.c.ClusterFenceManagerImpl]
> >> (Cluster-Notification-1:ctx-086ed801) (logid:8cd30369) Received node
> >> isolation notification, will perform self-fencing and shut myself down
> >
> > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> >> (Thread-26:null) (logid:) Failure during validating cluster request from
> >> 10.232.8.36
> >
> >
>
> Additionally, we checked the */etc/default/cloudstack-management*
> configuration file and found the following Java options configured under
> JAVA_OPTS*.*
>
>
> *JAVA_OPTS="-Djava.security.properties=/etc/cloudstack/management/java.security.ciphers
> > -Djava.awt.headless=true -Dcom.sun.management.jmxremote=false -Xmx2G
> > -XX:+UseParallelGC -XX:MaxGCPauseMillis=500
> -XX:+HeapDumpOnOutOfMemoryError
> > -XX:HeapDumpPath=/var/log/cloudstack/management/
> > -XX:ErrorFile=/var/log/cloudstack/management/cloudstack-management.err
> > -Djava.util.Arrays.useLegacyMergeSort=true "*
>
>
> And also after removing the *-Djava.util.Arrays.useLegacyMergeSort=true*
> option from *JAVA_OPTS. *We are unable to fetch the Public IPs from the
> dashboard and CMK as well. We got an error stating* Request failed (431):
> comparison method violates its general contract*.
>

Reply via email to