RE: K8s broker pod getting killed with OOM

Shiv Kumar Dixit Tue, 20 Jan 2026 09:13:03 -0800

Hi Arthur/Thorsten,
I was exploring and analysing memory settings of broker container and came 
across few things. Some input from you guys might be useful.

  1.  As of now we have not explicitly defined -XX:MaxDirectMemorySize so it 
will be taking JVM default and can be a potential reason of OOM.
  2.  Also, we have not explicitly defined -Dio.netty.maxDirectMemory and as 
broker heavily uses netty, it could be another potential reason of OOM.
  3.  We will try to fix both based on recommendations.
  4.  For testing, I set both min and max heap to 6000 MB and container request 
limit to 9000 MB, I checked following.

     *   Native memory summary and it came as below. Current committed max heap 
shows 6144000KB or 6000 MB and total committed JVM memory (heap + non-heap) 
shows 6560172KB which is more than max heap.

Total: reserved=7349741KB, committed=6666361KB

       malloc: 106189KB #580494

       mmap:   reserved=7243552KB, committed=6560172KB

Java Heap (reserved=6144000KB, committed=6144000KB)

                    (mmap: reserved=6144000KB, committed=6144000KB)

     *   I checked top container stat and it came as below. If top command 
gives total container memory usage (heap + non heap + container’s internal 
need), why it is less than native memory’s committed value? Ideally JVM memory 
usage (heap + non heap) should be less than current container’s memory usage, 
right?

k -n artemis-demo top pod demo-broker-ss-0 --containers
POD                                       NAME                    CPU(cores)   
MEMORY(bytes)
demo-broker-ss-0  demo-broker-container   18m          6010Mi
demo-broker-ss-0    vault-agent                            1m              35Mi

Here container total memory comes as 6010Mi (6301942KB) which is less than 
total committed memory value (6666361KB) from native memory step. If this is 
feasible?

     *   I checked JMX memory in Artemis broker GUI, it came as below. 
Committed heap 6291456000 matches with max heap defined i.e. 6000 MB. Non heap 
committed is 139657216. Total committed JVM memory usage appears to be 
6431113216 or 6280384 KB. This also does not match to total native memory 
committed value of 6666361KB from native memory step. However it is less than 
container’s total memory usage of 6010Mi (6301942KB).

Heap memory usage -> { "init": 6291456000, "committed": 6291456000, "max": 
6291456000, "used": 4526118136 }
Non heap memory usage -> { "init": 7667712, "committed": 139657216, "max": 
1224736768, "used": 134421688 }

                    Shall we expect all 3 results to be in matching range?

Best Regards
Shiv

From: Arthur Naseef <[email protected]>
Sent: 15 January 2026 09:59 PM
To: [email protected]
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

Shiv:

I recommend you lookup OOMKilled in Kubernetes.  I did that myself and found a 
page explaining how Kubernetes nodes and PODs are allocated.

From what you are describing, I think Artemis is working as-intended, and the 
JVM is working correctly.  Artemis cannot use more heap than the JVM allows.  
And the Linux OOM Killer (in the Linux Kernel itself) doesn't care about heap 
versus other types of memory used by the JVM - it only cares about the total 
amount of memory used by the JVM process - together with the total system 
memory and memory allocated by all processes.

Clebert has some advice on your tuning and settings - I recommend you follow 
his advice.  But I doubt that will stop the OOMKiller problems.

So 2 things I'd like to share here:

  1.  In your description of the test queue flow and broker log messages around 
paging - it all looks normal, and those log messages do NOT indicate a blocked 
consumer (which you suggested was the case in point #4).
  2.  My suspicion is that your Kubernetes PODs are overallocating the memory 
on their NODE and then when they go to use more memory than the node actually 
has available, the OOM Killer kicks off. (That's the heart of the OOM Killer - 
overallocation of memory when processes request it initially).
For #2, I saw an article explaining that Kubernetes does allow overallocation 
of PODS on a NODE.  It appears that it assigns PODS to NODES based on the 
"memory request" - which is a minimum requested memory size for the POD.  
However, PODS are allowed to expand up to their memory limit, which can then 
cause the actual NODE system memory limit to be exceeded.

A simple way to verify my assertions would be to create a process that performs 
a fixed memory allocation of the total POD size, and make sure the process 
actually writes to all of those pages, to force the O/S to allocate physical 
pages.  Then see if that simple process leads to the OOM Killer at various 
sizes.  Note that I don't know of a way to write such a program in Java, but it 
would be a fairly simple C program.

Another test you could perform - use a small broker (say 2gb heap) on a good 
size POD (say 8gb of memory request) and see if you can invoke the OOM Killer.

Cheers!

Art

On Thu, Jan 15, 2026 at 9:08 AM Shiv Kumar Dixit 
<[email protected]<mailto:[email protected]>> 
wrote:
We are using open source one from ArkMQ

Best Regards
Shiv

From: Clebert Suconic 
<[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 08:26 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

>>
We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is 
1.2.5.

Which one? the commercial version from Red Hat / openshift, or the opensource 
one from ArkMQ?

On Thu, Jan 15, 2026 at 9:51 AM Clebert Suconic 
<[email protected]<mailto:[email protected]>> wrote:
Those points you described are the reason why I suggesting using 
max-address-size on the every destination...

have max-size at say 20M for every destination (make 100K for small 
destinations if you like, but I think 20M for every destination is mostly okay, 
unless you have a lot of destinations).

have max-read at 10M...

This should then optimize your memory usage

On Thu, Jan 15, 2026 at 6:53 AM Shiv Kumar Dixit 
<[email protected]<mailto:[email protected]>> 
wrote:
Hello Arthur and Clebert
When our broker pod starts, it first starts 2 init containers which terminate 
and release resources after completing the setup. So our pod basically runs 2 
containers – one for vault and another for broker. We verified the memory and 
CPU usage of these init containers and main containers using top pod and it 
shows reasonable data.

Yes we see Linux OOMKiller is invoked and we are trying to read its report to 
see any meaningful information.

In the meanwhile, we have noticed below scenario is causing OOMKilling of 
broker container.

  1.  There are lot of pending messages on a given queue TEST along with small 
pending messages on various other queues. Since we are using global max size, 
some of messages are loaded in memory and rest are in paging folder.

  1.  There are 3-4 consumers on TEST queue but they are very slow hence 
pending message backlog is not cleared. We see below log in broker:

AMQ224127: Message dispatch from paging is blocked. Address TEST/Queue TEST 
will not read any more messages from paging until pending messages are 
acknowledged. There are currently 5150 messages pending (20972400 bytes) with 
max reads at maxPageReadMessages(-1) and maxPageReadBytes(20971520). Either 
increase reading attributes at the address-settings or change your consumers to 
acknowledge more often.

  1.  We also see below log in broker:

AMQ224108: Stopped paging on address ‘TEST’; size=62986496 bytes (96016 
messages); maxSize=-1 bytes (-1 messages); globalSize=430581015 bytes (158406 
messages); globalMaxSize=4194304000 bytes (-1 messages);

  1.  If such blocked consumers and pending messages combination can cause 
broker pod to go into OOM which is running with 30 GB of heap and 40 GB of pod 
memory?

  1.  Since consumers were not consuming messages on time and gave consent to 
purge the messages, we tried to purge the message manually via broker GUI. 
Sometimes it worked and more messages got loaded from pages to broker memory 
but many times broker pod got OOM and restarted.

  1.  This cycle of successful purge or broker restart continued till all 
messages from pages were loaded into memory and purged. Post cleanup there was 
no broker restart.

  1.  If purging messages via broker GUI can cause OOM even though broker pod 
is running with 30 GB of heap and 40 GB of pod memory?

  1.  What is the best way to optimize the broker configuration in such cases 
where we will always have slow consumers and possibly lot of pending messages 
in memory and paging folders?

This impacted broker pod A has a network bridge with another independent broker 
pod B in a hub and spoke model which has very less connection and almost no 
pending messages. We also noticed that if broker pod A goes into OOM due to 
slow consumer and pending messages as described above, second broker pod B 
which is connected over network bridge with first broker pod A also goes into 
restart loop with OOM. Does restart of source pod A and 
disconnection-reconnection of small numbers of bridges can cause target broker 
pod B to restart? We have seen this side effect as well.

We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is 
1.2.5.

Best Regards
Shiv

From: Arthur Naseef <[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 06:40 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

So 3100 connections is a large number, but that doesn't sound like a good 
reason for the broker pod to go OOM.  Also, getting up to 40gb, I would say the 
50% rule of thumb may be too conservative (i.e. a higher percentage could be 
reasonable), which is contradicted by your outcome.  Are there other containers 
running in the same POD that might be taking up memory?  Maybe sidecars?

Unfortunately, I don't have a working kubernetes setup available right now.  If 
I did, I could poke around and try to give specific tips on checking the memory 
use of the POD.

Do you know if the Linux OOM killer is getting invoked?  That would be reported 
by the kernel of the node on which the pod was executing.  If you can view that 
report, it includes a lot of useful information, including all of the processes 
involved and the amount of memory used by each.

Art

On Wed, Jan 14, 2026 at 3:52 PM Shiv Kumar Dixit 
<[email protected]<mailto:[email protected]>> 
wrote:
Thanks Clebert and Arthur for inputs. I will try your suggestions and let you 
know how it goes.

I have another observation based on issue happening in live. Based on input 
from Arthur, current setup is configured with 20 GB heap and 40 GB pod. As the 
pod started, we got 3100 connections to broker and within minutes the pod got 
OOMKilled. If there is any relation b/w number of connections on broker and pod 
going OOM?

Best Regards
Shiv

-----Original Message-----
From: Clebert Suconic 
<[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 04:06 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

so, in summary, what I'm recommending you is:

use max-size-messages for all the queues.. for your large queues, use something 
like 10MB and for your small queues 100K

also keep max-read-page-bytes in use... keep it at 20M

If I could change the past I would have a max-size on every address we deploy, 
and having global-max-size for the upmost emergency case..
it's something I'm looking to change into artemis 3.0 or 4.0. (I can't change 
that into a minor version, as it could break certain cases...
as some users that I know use heavy filtering and can't really rely on paging).

On Wed, Jan 14, 2026 at 5:31 PM Clebert Suconic 
<[email protected]<mailto:[email protected]>> wrote:
>
> I would recommend against trusting global-max-size. and use max-size
> for all the addresses.
>
> Also what is your reading attributes. I would recommending using the
> new prefetch values.
>
>
>
> And also what operator are you using? arkmq? your own?
>
> On Wed, Jan 14, 2026 at 7:44 AM Shiv Kumar Dixit
> <[email protected]<mailto:[email protected]>> 
> wrote:
> >
> > We are hosting Artemis broker in Kubernetes using operator-based solution. 
> > We deploy the broker as statefulset with 2 or 4 replicas. We assign for 
> > e.g. 6 GB for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for 
> > global-max-size. All addresses normally use -1 for max-size-bytes but some 
> > less frequently used queues are defined with 100KB for max-size-bytes to 
> > allow early paging.
> >
> >
> >
> > We have following observations:
> >
> > 1. As the broker pod starts, broker container immediately occupies 6 GB for 
> > max heap. It seems expected as both min and max heap are same.
> >
> > 2. Pod memory usage starts with 6+ GB and once we have pending messages, 
> > good producers and consumers connect to broker, invalid SSL attempts 
> > happen, broker GUI access happens etc. during normal broker operations - 
> > pod memory usage keeps increasing and now reaches 9 GB.
> >
> > 3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event 
> > and restarts the pod. Here we don’t see broker container getting killed 
> > with OOM rather pod is killed and restarted. It forces the broker to 
> > restart.
> >
> > 4. We have configured artemis.profile to capture memory dump in case of OOM 
> > of broker but it never happens. So, we are assuming broker process is not 
> > going out of memory, but pod is going out of memory due to increased 
> > non-heap usage.
> >
> > 5. Only way to recover here is to increase heap and pod memory limits from 
> > 6 GB and 9 GB to higher values and wait for next re-occurrence.
> >
> >
> >
> > 1. Is there any way to analyse what is going wrong with non-heap native 
> > memory usage?
> >
> > 2. If non-heap native memory is expected to increase to such extent due to 
> > pending messages, SSL errors etc.?
> >
> > 3. Is there any param we can use to restrict the non-heap native memory 
> > usage?
> >
> > 4. If netty which handles connection aspect of broker can create such 
> > memory consumption and cause OOM of pod?
> >
> > 5. Can we have any monitoring param that can hint that pod is potentially 
> > in danger of getting killed?
> >
> >
> >
> > Thanks
> >
> > Shiv
>
>
>
> --
> Clebert Suconic

--
Clebert Suconic

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

--
Clebert Suconic

--
Clebert Suconic

RE: K8s broker pod getting killed with OOM

Reply via email to