
We have been using AMQ in production for quite a while some time already,
and we are noticing a strange behavior on one of our queues.

The situation is as follows:

- we do clickstream traffic so when we have identified a user, all his
events are "grouped" by JMSXGroupID property (which is an UUID, in our case,
we can have millions of these per hour) so we have some order in consuming
the events for the same user in case they do burst
- we use KahaDB with kinda the following config:

<mKahaDB directory="${activemq.data}/mkahadb">
        <filteredKahaDB perDestination="true">
                <kahaDB checkForCorruptJournalFiles="true"
journalDiskSyncStrategy="PERIODIC" journalDiskSyncInterval="5000"
preallocationStrategy="zeros" concurrentStoreAndDispatchQueues="false" />

- the broker is in a rather beefy EC2 instance, but it doesn't seem to hit
any limits, neither file limits, nor IOPS, nor CPU limits
- destination policy for this destination uses, very similar to a lot other
destinations that use the same grouping for JMSXGroupID:

<policyEntry queue="suchDestination>" producerFlowControl="false"
memoryLimit="256mb" maxPageSize="5000" maxBrowsePageSize="2000">
        <individualDeadLetterStrategy queuePrefix="DLQ."
useQueueForQueueMessages="true" />

- consumers consume messages fairly slowly compared to other destinations
(about 50-100ms per message compared to 
other consumers for other destinations- about 10-30ms per message)

- however, it seems we end up in a situation, where the consumers are not
consuming with the speed we expect them to be doing, and seem to wait for
something, while there is a huge load of messages on the remote broker for
that destination. The consumers seem to also not be neither CPU, nor IO
bound, nor network traffic bound. 

- a symptom is that if we split that queue to two queues and we attach the
same number of consumers in the same number of nodes to consume it, things
are somehow becoming better. Also, if there is a huge workload for that
queue, if we just rename it to suchQueue2 on producers, and assign some
consumers on it, these consumers are much faster (for a while) than the
consumers on the "old" suchQueue.

- the queue doesn't have "non-grouped messages", all messages on it have the
JMSXGroupID property and are of the same type.

- increasing the number of consumers or lowering it for that queue seems to
have little effect

- rebooting the consumer apps seems to have little effect once the queue
becomes "slow to consume"

Has anybody experienced this:

in short:

Broker is waiting a considerable time for the consumers who seem to be free
and not busy all the time.

