Hello Nischal,

As per your suggestions, we increased the memory to 6GB and monitored it
for two days. We didn't encounter any management service failures. Thank
you for the solution.


On Fri, Jul 4, 2025 at 9:57 AM Nischal P <nischalnisc...@gmail.com> wrote:

> Hi Raghava
>
> Can you please increase the memory in JAVA_OPTS
>
>  -Xmx2G   to Xmx6G or so after checking with the  RAM available on the
> system.
>
>
>
>
> Thanks & Regards
> Nischal
>
>
> On Thu, Jul 3, 2025 at 9:58 PM Raghava Yerubandi <
> raghavayeruba...@gmail.com>
> wrote:
>
> > Hello Community,
> >
> > We recently upgraded our ACS environment from version *4.19.1* to
> *4.19.3*.
> > Post-upgrade, we are experiencing recurring issues where the
> > *cloudstack-management* services on our management nodes are failing
> > intermittently until the services are manually restarted.
> >
> > Upon reviewing the logs from the affected management servers, we
> > observed *java.lang.OutOfMemoryError:
> > Java heap space* Exceptions are being repeatedly logged.
> >
> > Below are the logs.
> >
> > > java[1039537]: INFO  [c.c.k.c.KubernetesClusterManagerImpl]
> > > (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71)
> > Running
> > > Kubernetes cluster state scanner on Kubernetes cluster : Test-Cluster
> for
> > > state: Alert
> >
> > java[1039537]: java.lang.OutOfMemoryError: Java heap space
> > >
> > > java[1039537]: Dumping heap to
> > >> /var/log/cloudstack/management/java_pid1039537.hprof ...
> > >
> > > java[1039537]: Heap dump file created [1850233163 bytes in 21.923 secs]
> > >
> > > java[1039537]: WARN  [c.c.u.d.Merovingian2]
> > >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Timed out on
> > acquiring
> > >> lock networks4235 .  Waited for 600seconds
> > >
> > > java[1039537]: com.cloud.utils.exception.CloudRuntimeException: Timed
> out
> > >> on acquiring lock networks4235 .  Waited for 600seconds
> > >
> > > java[1039537]:         at
> > >> com.cloud.utils.db.Merovingian2.acquire(Merovingian2.java:151)
> > >
> > > java[1039537]:         at
> > >> com.cloud.utils.db.TransactionLegacy.lock(TransactionLegacy.java:386)
> > >
> > > java[1039537]:         at
> > >>
> >
> com.cloud.utils.db.GenericDaoBase.acquireInLockTable(GenericDaoBase.java:1074)
> > >
> > > java[1039537]:         at
> > >> jdk.internal.reflect.GeneratedMethodAccessor474.invoke(Unknown Source)
> > >
> > > java[1039537]:         at
> > >>
> >
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >
> > > java[1039537]:         at
> > >> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
> > >
> > > java[1039537]:         at
> > >>
> >
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
> > >
> > > java[1039537]:         at
> > >> com.sun.proxy.$Proxy52.acquireInLockTable(Unknown Source)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.shutdownNetwork(NetworkOrchestrator.java:3080)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.reallyRun(NetworkOrchestrator.java:3530)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$NetworkGarbageCollector.runInContext(NetworkOrchestrator.java:3466)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> > >
> > > java[1039537]:         at
> > >>
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> > >
> > > java[1039537]:         at
> > >>
> >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > >
> > > java[1039537]:         at
> > >>
> >
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> > >
> > > java[1039537]:         at
> > >>
> >
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> > >
> > > java[1039537]:         at
> > >>
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > >
> > > java[1039537]:         at
> > >>
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > >
> > > java[1039537]:         at
> java.base/java.lang.Thread.run(Thread.java:829)
> > >
> > > java[1039537]: WARN  [o.a.c.e.o.NetworkOrchestrator]
> > >> (Network-Scavenger-1:ctx-8673d47e) (logid:7e2df3dc) Network with id:
> > 4235
> > >> doesn't exist, or unable to acquire lock for it as a part of network
> > >> shutdown
> > >
> > > java[1039537]: ERROR [c.c.c.ClusterManagerImpl]
> > >> (Cluster-Heartbeat-1:ctx-cdbfb71b) (logid:cd4a1b55) We have detected
> > that
> > >> at least one management server peer reports that this management
> server
> > is
> > >> down, perform active fencing>Jul 02 05:43:10
> > sc-mgmt-03.speedcloud.co.in
> > >> java[1039537]: INFO  [c.c.a.m.AgentManagerImpl]
> > >> (AgentMonitor-1:ctx-0fb8adb7) (logid:581cfc3f) Found the following
> > agents
> > >> behind on ping: [58, 67, 116, 129, 147]
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.36
> > >
> > > java[1039537]: INFO  [o.a.c.v.s.VMSchedulerImpl]
> > >> (VMSchedulerPollTask:ctx-8b5f50ea) (logid:69b8fd3c) Cleaned up 0 VM
> > >> scheduled job entries
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.37
> > >
> > > java[1039537]: INFO  [c.c.k.c.KubernetesClusterManagerImpl]
> > >> (Kubernetes-Cluster-State-Scanner-1:ctx-56efc1be) (logid:e0d2bb71)
> > Running
> > >> Kubernetes cluster state scanner on Kubernetes cluster :
> Stack-k8s-Test
> > for
> > >> state: Alert
> > >
> > > java[1039537]: WARN  [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> > >> (RouterStatusMonitor-1:ctx-cbfa09ce) (logid:56ce8a0c) Unable to fetch
> > basic
> > >> router test results data from router r-15355-VM
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.36
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.37
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.36
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.37
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.36
> > >
> > > java[1039537]: WARN  [c.c.a.d.ParamGenericValidationWorker]
> > >> (qtp940584193-1652:ctx-d5a00ae4 ctx-0e610389 ctx-a37dcc30)
> > (logid:a55ae204)
> > >> Received unknown parameters for command listHosts. Unknown parameters
> :
> > >> filter
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.37
> > >
> > > java[1039537]: INFO  [c.c.s.d.VolumeStatsDaoImpl]
> > >> (StatsCollector-2:ctx-b81c41e3) (logid:a37e0ad7) Removed a total of
> [0]
> > >> volume_stats rows older than [Tue Jul 01 17:43:10 UTC 2025].
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.36
> > >
> > > java[1039537]: ERROR [c.c.s.StatsCollector]
> > >> (StatsCollector-2:ctx-49034cc7) (logid:9cc0b9c6) db statistics
> > collection
> > >> failed due to / by zero
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.37
> > >
> > > java[1039537]: ERROR [c.c.s.StatsCollector]
> > >> (StatsCollector-2:ctx-d3b2a30a) (logid:657e156f) db statistics
> > collection
> > >> failed due to / by zero
> > >
> > > java[1039537]: ERROR [c.c.c.ClusterManagerImpl]
> > >> (Cluster-Heartbeat-1:ctx-c8128059) (logid:4b587df9) We have detected
> > that
> > >> at least one management server peer reports that this management
> server
> > is
> > >> down, perform active fencing>Jul 02 05:43:11
> > sc-mgmt-03.speedcloud.co.in
> > >> java[1039537]: INFO  [c.c.a.m.AgentManagerImpl]
> > >> (AgentTaskPool-1:ctx-04653cf4) (logid:514252d2) Investigating why host
> > 58
> > >> has disconnected with event PingTimeout
> > >
> > > java[1039537]: ERROR [c.c.c.ClusterFenceManagerImpl]
> > >> (Cluster-Notification-1:ctx-086ed801) (logid:8cd30369) Received node
> > >> isolation notification, will perform self-fencing and shut myself down
> > >
> > > java[1039537]: WARN  [c.c.c.ClusterServiceServletContainer]
> > >> (Thread-26:null) (logid:) Failure during validating cluster request
> from
> > >> 10.232.8.36
> > >
> > >
> >
> > Additionally, we checked the */etc/default/cloudstack-management*
> > configuration file and found the following Java options configured under
> > JAVA_OPTS*.*
> >
> >
> >
> *JAVA_OPTS="-Djava.security.properties=/etc/cloudstack/management/java.security.ciphers
> > > -Djava.awt.headless=true -Dcom.sun.management.jmxremote=false -Xmx2G
> > > -XX:+UseParallelGC -XX:MaxGCPauseMillis=500
> > -XX:+HeapDumpOnOutOfMemoryError
> > > -XX:HeapDumpPath=/var/log/cloudstack/management/
> > > -XX:ErrorFile=/var/log/cloudstack/management/cloudstack-management.err
> > > -Djava.util.Arrays.useLegacyMergeSort=true "*
> >
> >
> > And also after removing the *-Djava.util.Arrays.useLegacyMergeSort=true*
> > option from *JAVA_OPTS. *We are unable to fetch the Public IPs from the
> > dashboard and CMK as well. We got an error stating* Request failed (431):
> > comparison method violates its general contract*.
> >
>

Reply via email to