Artemis - dealing with variable consumer performance on a large cluster

Graham Stewart Tue, 25 Jan 2022 10:57:17 -0800

Hi,
Firstly, thanks to everyone involved in the Artemis project, it's a key
part of the system I work on.


We have a symmetric cluster of 32 brokers split across 2 physical data
centres (16 in each).
All 32 hosts are VMs. Each host runs a JVM with an embedded Artemis broker
and several other JVMs that connect to their local broker and produce
and/or consume messages.
Messages are distributed round-robin across the cluster.

This approach has worked well for us in several other environments where
we've used homogeneous physical hardware. It's also been fine in
environments where we use VMs that are running on undersubscribed physicals.
We're coming up against a problem with VMs on oversubscribed physicals due
to variable performance in consuming/producing messages.

An example may help:

A "job" produces 32,000 messages.
These are distributed round-robin across the 32 brokers - 1,000 messages on
each.
On each host there's a process consuming these 1,000 messages.

When these consumer processes are perform similarly, each process their
1,000 message in roughly the same time and the job completes. Great.

However, should one of the processes slow down to let's say half speed, we
are left waiting twice as long for the job to complete. Not only that but
there are 31 consumer processes left idle.

Do you have any ideas on how we can handle this better? I don't see how the
slow consumer approach of setting the consumer window to zero would help
here. We're really looking for something like the cluster detecting there
are idle consumers and redistributing messages away form the broker with
the slow consumers.

Any pointers you can give me would be really appreciated.

Many thanks

Graham Stewart

Artemis - dealing with variable consumer performance on a large cluster

Reply via email to