Hi Arthur/Thorsten,
I was exploring and analysing memory settings of broker container and came
across few things. Some input from you guys might be useful.
1. As of now we have not explicitly defined -XX:MaxDirectMemorySize so it
will be taking JVM default and can be a potential reason of OOM.
2. Also, we have not explicitly defined -Dio.netty.maxDirectMemory and as
broker heavily uses netty, it could be another potential reason of OOM.
3. We will try to fix both based on recommendations.
4. For testing, I set both min and max heap to 6000 MB and container request
limit to 9000 MB, I checked following.
* Native memory summary and it came as below. Current committed max heap
shows 6144000KB or 6000 MB and total committed JVM memory (heap + non-heap)
shows 6560172KB which is more than max heap.
Total: reserved=7349741KB, committed=6666361KB
malloc: 106189KB #580494
mmap: reserved=7243552KB, committed=6560172KB
Java Heap (reserved=6144000KB, committed=6144000KB)
(mmap: reserved=6144000KB, committed=6144000KB)
* I checked top container stat and it came as below. If top command
gives total container memory usage (heap + non heap + container’s internal
need), why it is less than native memory’s committed value? Ideally JVM memory
usage (heap + non heap) should be less than current container’s memory usage,
right?
k -n artemis-demo top pod demo-broker-ss-0 --containers
POD NAME CPU(cores)
MEMORY(bytes)
demo-broker-ss-0 demo-broker-container 18m 6010Mi
demo-broker-ss-0 vault-agent 1m 35Mi
Here container total memory comes as 6010Mi (6301942KB) which is less than
total committed memory value (6666361KB) from native memory step. If this is
feasible?
* I checked JMX memory in Artemis broker GUI, it came as below.
Committed heap 6291456000 matches with max heap defined i.e. 6000 MB. Non heap
committed is 139657216. Total committed JVM memory usage appears to be
6431113216 or 6280384 KB. This also does not match to total native memory
committed value of 6666361KB from native memory step. However it is less than
container’s total memory usage of 6010Mi (6301942KB).
Heap memory usage -> { "init": 6291456000, "committed": 6291456000, "max":
6291456000, "used": 4526118136 }
Non heap memory usage -> { "init": 7667712, "committed": 139657216, "max":
1224736768, "used": 134421688 }
Shall we expect all 3 results to be in matching range?
Best Regards
Shiv
From: Arthur Naseef <[email protected]>
Sent: 15 January 2026 09:59 PM
To: [email protected]
Subject: Re: K8s broker pod getting killed with OOM
Unverified Sender: The sender of this email has not been verified. Review the
content of the message carefully and verify the identity of the sender before
acting on this email: replying, opening attachments or clicking links.
Shiv:
I recommend you lookup OOMKilled in Kubernetes. I did that myself and found a
page explaining how Kubernetes nodes and PODs are allocated.
From what you are describing, I think Artemis is working as-intended, and the
JVM is working correctly. Artemis cannot use more heap than the JVM allows.
And the Linux OOM Killer (in the Linux Kernel itself) doesn't care about heap
versus other types of memory used by the JVM - it only cares about the total
amount of memory used by the JVM process - together with the total system
memory and memory allocated by all processes.
Clebert has some advice on your tuning and settings - I recommend you follow
his advice. But I doubt that will stop the OOMKiller problems.
So 2 things I'd like to share here:
1. In your description of the test queue flow and broker log messages around
paging - it all looks normal, and those log messages do NOT indicate a blocked
consumer (which you suggested was the case in point #4).
2. My suspicion is that your Kubernetes PODs are overallocating the memory
on their NODE and then when they go to use more memory than the node actually
has available, the OOM Killer kicks off. (That's the heart of the OOM Killer -
overallocation of memory when processes request it initially).
For #2, I saw an article explaining that Kubernetes does allow overallocation
of PODS on a NODE. It appears that it assigns PODS to NODES based on the
"memory request" - which is a minimum requested memory size for the POD.
However, PODS are allowed to expand up to their memory limit, which can then
cause the actual NODE system memory limit to be exceeded.
A simple way to verify my assertions would be to create a process that performs
a fixed memory allocation of the total POD size, and make sure the process
actually writes to all of those pages, to force the O/S to allocate physical
pages. Then see if that simple process leads to the OOM Killer at various
sizes. Note that I don't know of a way to write such a program in Java, but it
would be a fairly simple C program.
Another test you could perform - use a small broker (say 2gb heap) on a good
size POD (say 8gb of memory request) and see if you can invoke the OOM Killer.
Cheers!
Art
On Thu, Jan 15, 2026 at 9:08 AM Shiv Kumar Dixit
<[email protected]<mailto:[email protected]>>
wrote:
We are using open source one from ArkMQ
Best Regards
Shiv
From: Clebert Suconic
<[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 08:26 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM
Unverified Sender: The sender of this email has not been verified. Review the
content of the message carefully and verify the identity of the sender before
acting on this email: replying, opening attachments or clicking links.
>>
We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is
1.2.5.
Which one? the commercial version from Red Hat / openshift, or the opensource
one from ArkMQ?
On Thu, Jan 15, 2026 at 9:51 AM Clebert Suconic
<[email protected]<mailto:[email protected]>> wrote:
Those points you described are the reason why I suggesting using
max-address-size on the every destination...
have max-size at say 20M for every destination (make 100K for small
destinations if you like, but I think 20M for every destination is mostly okay,
unless you have a lot of destinations).
have max-read at 10M...
This should then optimize your memory usage
On Thu, Jan 15, 2026 at 6:53 AM Shiv Kumar Dixit
<[email protected]<mailto:[email protected]>>
wrote:
Hello Arthur and Clebert
When our broker pod starts, it first starts 2 init containers which terminate
and release resources after completing the setup. So our pod basically runs 2
containers – one for vault and another for broker. We verified the memory and
CPU usage of these init containers and main containers using top pod and it
shows reasonable data.
Yes we see Linux OOMKiller is invoked and we are trying to read its report to
see any meaningful information.
In the meanwhile, we have noticed below scenario is causing OOMKilling of
broker container.
1. There are lot of pending messages on a given queue TEST along with small
pending messages on various other queues. Since we are using global max size,
some of messages are loaded in memory and rest are in paging folder.
1. There are 3-4 consumers on TEST queue but they are very slow hence
pending message backlog is not cleared. We see below log in broker:
AMQ224127: Message dispatch from paging is blocked. Address TEST/Queue TEST
will not read any more messages from paging until pending messages are
acknowledged. There are currently 5150 messages pending (20972400 bytes) with
max reads at maxPageReadMessages(-1) and maxPageReadBytes(20971520). Either
increase reading attributes at the address-settings or change your consumers to
acknowledge more often.
1. We also see below log in broker:
AMQ224108: Stopped paging on address ‘TEST’; size=62986496 bytes (96016
messages); maxSize=-1 bytes (-1 messages); globalSize=430581015 bytes (158406
messages); globalMaxSize=4194304000 bytes (-1 messages);
1. If such blocked consumers and pending messages combination can cause
broker pod to go into OOM which is running with 30 GB of heap and 40 GB of pod
memory?
1. Since consumers were not consuming messages on time and gave consent to
purge the messages, we tried to purge the message manually via broker GUI.
Sometimes it worked and more messages got loaded from pages to broker memory
but many times broker pod got OOM and restarted.
1. This cycle of successful purge or broker restart continued till all
messages from pages were loaded into memory and purged. Post cleanup there was
no broker restart.
1. If purging messages via broker GUI can cause OOM even though broker pod
is running with 30 GB of heap and 40 GB of pod memory?
1. What is the best way to optimize the broker configuration in such cases
where we will always have slow consumers and possibly lot of pending messages
in memory and paging folders?
This impacted broker pod A has a network bridge with another independent broker
pod B in a hub and spoke model which has very less connection and almost no
pending messages. We also noticed that if broker pod A goes into OOM due to
slow consumer and pending messages as described above, second broker pod B
which is connected over network bridge with first broker pod A also goes into
restart loop with OOM. Does restart of source pod A and
disconnection-reconnection of small numbers of bridges can cause target broker
pod B to restart? We have seen this side effect as well.
We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is
1.2.5.
Best Regards
Shiv
From: Arthur Naseef <[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 06:40 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM
Unverified Sender: The sender of this email has not been verified. Review the
content of the message carefully and verify the identity of the sender before
acting on this email: replying, opening attachments or clicking links.
So 3100 connections is a large number, but that doesn't sound like a good
reason for the broker pod to go OOM. Also, getting up to 40gb, I would say the
50% rule of thumb may be too conservative (i.e. a higher percentage could be
reasonable), which is contradicted by your outcome. Are there other containers
running in the same POD that might be taking up memory? Maybe sidecars?
Unfortunately, I don't have a working kubernetes setup available right now. If
I did, I could poke around and try to give specific tips on checking the memory
use of the POD.
Do you know if the Linux OOM killer is getting invoked? That would be reported
by the kernel of the node on which the pod was executing. If you can view that
report, it includes a lot of useful information, including all of the processes
involved and the amount of memory used by each.
Art
On Wed, Jan 14, 2026 at 3:52 PM Shiv Kumar Dixit
<[email protected]<mailto:[email protected]>>
wrote:
Thanks Clebert and Arthur for inputs. I will try your suggestions and let you
know how it goes.
I have another observation based on issue happening in live. Based on input
from Arthur, current setup is configured with 20 GB heap and 40 GB pod. As the
pod started, we got 3100 connections to broker and within minutes the pod got
OOMKilled. If there is any relation b/w number of connections on broker and pod
going OOM?
Best Regards
Shiv
-----Original Message-----
From: Clebert Suconic
<[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 04:06 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM
Unverified Sender: The sender of this email has not been verified. Review the
content of the message carefully and verify the identity of the sender before
acting on this email: replying, opening attachments or clicking links.
so, in summary, what I'm recommending you is:
use max-size-messages for all the queues.. for your large queues, use something
like 10MB and for your small queues 100K
also keep max-read-page-bytes in use... keep it at 20M
If I could change the past I would have a max-size on every address we deploy,
and having global-max-size for the upmost emergency case..
it's something I'm looking to change into artemis 3.0 or 4.0. (I can't change
that into a minor version, as it could break certain cases...
as some users that I know use heavy filtering and can't really rely on paging).
On Wed, Jan 14, 2026 at 5:31 PM Clebert Suconic
<[email protected]<mailto:[email protected]>> wrote:
>
> I would recommend against trusting global-max-size. and use max-size
> for all the addresses.
>
> Also what is your reading attributes. I would recommending using the
> new prefetch values.
>
>
>
> And also what operator are you using? arkmq? your own?
>
> On Wed, Jan 14, 2026 at 7:44 AM Shiv Kumar Dixit
> <[email protected]<mailto:[email protected]>>
> wrote:
> >
> > We are hosting Artemis broker in Kubernetes using operator-based solution.
> > We deploy the broker as statefulset with 2 or 4 replicas. We assign for
> > e.g. 6 GB for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for
> > global-max-size. All addresses normally use -1 for max-size-bytes but some
> > less frequently used queues are defined with 100KB for max-size-bytes to
> > allow early paging.
> >
> >
> >
> > We have following observations:
> >
> > 1. As the broker pod starts, broker container immediately occupies 6 GB for
> > max heap. It seems expected as both min and max heap are same.
> >
> > 2. Pod memory usage starts with 6+ GB and once we have pending messages,
> > good producers and consumers connect to broker, invalid SSL attempts
> > happen, broker GUI access happens etc. during normal broker operations -
> > pod memory usage keeps increasing and now reaches 9 GB.
> >
> > 3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event
> > and restarts the pod. Here we don’t see broker container getting killed
> > with OOM rather pod is killed and restarted. It forces the broker to
> > restart.
> >
> > 4. We have configured artemis.profile to capture memory dump in case of OOM
> > of broker but it never happens. So, we are assuming broker process is not
> > going out of memory, but pod is going out of memory due to increased
> > non-heap usage.
> >
> > 5. Only way to recover here is to increase heap and pod memory limits from
> > 6 GB and 9 GB to higher values and wait for next re-occurrence.
> >
> >
> >
> > 1. Is there any way to analyse what is going wrong with non-heap native
> > memory usage?
> >
> > 2. If non-heap native memory is expected to increase to such extent due to
> > pending messages, SSL errors etc.?
> >
> > 3. Is there any param we can use to restrict the non-heap native memory
> > usage?
> >
> > 4. If netty which handles connection aspect of broker can create such
> > memory consumption and cause OOM of pod?
> >
> > 5. Can we have any monitoring param that can hint that pod is potentially
> > in danger of getting killed?
> >
> >
> >
> > Thanks
> >
> > Shiv
>
>
>
> --
> Clebert Suconic
--
Clebert Suconic
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]<mailto:[email protected]>
For additional commands, e-mail:
[email protected]<mailto:[email protected]>
--
Clebert Suconic
--
Clebert Suconic