Hi Kumar,

I tried to read all your questions. It’s really hard to understand the problem 
but, to add to what other already said, I would suggest the following things:

  1.  Try to reproduce the OOM issue under simple setup. Maybe even without 
Kubernetes involved.
  2.  Remove -Xms and -Xms JVM options and start using -XX:MaxRAMPercentage. It 
allows for automatic heap adjustment depending on memory available for JVM. 
This is especially important in containers where memory availability could 
change dynamically.
You didn’t provide information which JVM you are using. On HotSpot this 
parameter is set to 25% by default. On OpenJ9 it is set to 50%. For Artemis I 
would advice starting with 50%. If your memory configuration is on the higher 
end, then maybe go to 60 or 65%. Don’t go above 70%.
  3.  Another important parameter is -XX:MaxDirectMemorySize. Netty uses direct 
memory. Others may correct me if I’m wrong, but I assume most of the direct 
memory in Artemis is used by those connections.
By default -XX:MaxDirectMemorySize is set to the same _inferred_ value as -Xmx 
OR -XX:MaxRAMPercentage. If you set -XX:MaxRAMPercentage to lower values, you 
probably could ignore direct memory parameter for now. However, if you set 
-XX:MaxRAMPercentage to something like 60% and -XX:MaxDirectMemorySize is not 
set, and the container has, let’s say, 10GB of RAM available, maximum size of 
heap will be set to 6GB and the maximum size of direct memory will also be 6GB. 
12 GB is obviously more than 10 GB. It basically means the more you increase 
heap percentage limit, probability to consume all the memory using heap and 
direct memory combination rises. In addition to that JVM uses additional memory 
for internal stuff. Sadly -XX:DirectMemorySize cannot be set to percentage of 
total memory available. For most stable result you should set 
-XX:MaxDirectMemorySize so that heap + direct memory size never goes above 
80-85% of total memory. But before that, I would  measure what’s really 
happening in JVM first.
  4.  The interesting part is Netty. Depending on your JVM and its version, 
Netty uses different strategies for memory allocation. The most interesting 
parameter for you is -Dio.netty.maxDirectMemory 
https://github.com/netty/netty/blob/f80b70c75ed7dff27d7e74d2c18ca8a0724a1cc7/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L156
 Netty uses this parameter to determine maximum direct memory available for 
JVM, however it uses both, DirectByteBuffers, and Unsafe memory allocation 
strategies (again, depending on your JVM). This means you cannot see full 
memory usage produced by Netty using just JVM tools (like JMX for example). See 
this comment https://github.com/netty/netty/issues/11895#issuecomment-987739006 
and whole thread for more information on that. Also, don’t miss this great 
explanation 
https://netty.io/wiki/java-24-and-sun.misc.unsafe.html#what-netty-uses-unsafe-for
  5.  You should be able to see what’s going on memory wise in Artemis and JVM 
(including all the memory used by the Netty 
https://artemis.apache.org/components/artemis/documentation/latest/metrics.html#netty-allocator)
 using Artemis Prometheus Metrics plugin 
https://github.com/rh-messaging/artemis-prometheus-metrics-plugin. This should 
show you a more clear picture what memory parameters and values you need to set.
  6.  Also, don’t forget that on Kubernetes JVM takes max memory available from 
“limits” parameter. Don’t ever set “limits” to larger value than that available 
on the smallest node.

And last but not the least. Unless you have very specific reason to do it I, 
personally, advice against running Artemis (or any broker for that matter) 
under Kubernetes. Most brokers use memory and CPU extensively anyway. One might 
see using operator as convenient configuration option, however, there are 
plenty of tools to do modern configurations under simple VMs. In summary, I 
don’t see how Kubernetes provide any value for such use case. It just 
complicates debugging and maintenance. You also don’t have a full power of 
Linux kernel in case of memory pressure, for example, swap is (mostly) not 
available under Kubernetes.

Hope that helps.

--
   Best Regards,
    Vilius

From: Shiv Kumar Dixit <[email protected]>
Sent: Wednesday, January 14, 2026 2:44 PM
To: [email protected]
Subject: K8s broker pod getting killed with OOM

We are hosting Artemis broker in Kubernetes using operator-based solution. We 
deploy the broker as statefulset with 2 or 4 replicas. We assign for e.g. 6 GB 
for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for global-max-size. All 
addresses normally use -1 for max-size-bytes but some less frequently used 
queues are defined with 100KB for max-size-bytes to allow early paging.

We have following observations:
1. As the broker pod starts, broker container immediately occupies 6 GB for max 
heap. It seems expected as both min and max heap are same.
2. Pod memory usage starts with 6+ GB and once we have pending messages, good 
producers and consumers connect to broker, invalid SSL attempts happen, broker 
GUI access happens etc. during normal broker operations - pod memory usage 
keeps increasing and now reaches 9 GB.
3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event and 
restarts the pod. Here we don’t see broker container getting killed with OOM 
rather pod is killed and restarted. It forces the broker to restart.
4. We have configured artemis.profile to capture memory dump in case of OOM of 
broker but it never happens. So, we are assuming broker process is not going 
out of memory, but pod is going out of memory due to increased non-heap usage.
5. Only way to recover here is to increase heap and pod memory limits from 6 GB 
and 9 GB to higher values and wait for next re-occurrence.

1. Is there any way to analyse what is going wrong with non-heap native memory 
usage?
2. If non-heap native memory is expected to increase to such extent due to 
pending messages, SSL errors etc.?
3. Is there any param we can use to restrict the non-heap native memory usage?
4. If netty which handles connection aspect of broker can create such memory 
consumption and cause OOM of pod?
5. Can we have any monitoring param that can hint that pod is potentially in 
danger of getting killed?

Thanks
Shiv

Reply via email to