Hi, Firstly, thanks to everyone involved in the Artemis project, it's a key part of the system I work on.
We have a symmetric cluster of 32 brokers split across 2 physical data centres (16 in each). All 32 hosts are VMs. Each host runs a JVM with an embedded Artemis broker and several other JVMs that connect to their local broker and produce and/or consume messages. Messages are distributed round-robin across the cluster. This approach has worked well for us in several other environments where we've used homogeneous physical hardware. It's also been fine in environments where we use VMs that are running on undersubscribed physicals. We're coming up against a problem with VMs on oversubscribed physicals due to variable performance in consuming/producing messages. An example may help: A "job" produces 32,000 messages. These are distributed round-robin across the 32 brokers - 1,000 messages on each. On each host there's a process consuming these 1,000 messages. When these consumer processes are perform similarly, each process their 1,000 message in roughly the same time and the job completes. Great. However, should one of the processes slow down to let's say half speed, we are left waiting twice as long for the job to complete. Not only that but there are 31 consumer processes left idle. Do you have any ideas on how we can handle this better? I don't see how the slow consumer approach of setting the consumer window to zero would help here. We're really looking for something like the cluster detecting there are idle consumers and redistributing messages away form the broker with the slow consumers. Any pointers you can give me would be really appreciated. Many thanks Graham Stewart