Hello,

I have experience with very high volume pub/sub using Tibco Rendezvous
(multicast) for an internal monitoring & business analytics system that I
led the development of at eBay. That system routinely had over 1Gbps of data
in flight on the datacenter's GigE network, with dozens of blade servers
publishing, and even more blade servers subscribing.

I'm now at a different company, and we're building products that will have a
similar architecture, though likely more modest data volumes. We're using
the ActiveMQ 5.2.0 release and ActiveMQ-CPP 2.2.2 release. I'm still coming
up to speed on the ActiveMQ architecture, configuration, tools, etc. Over
the last couple weeks I've modified the TopicPublisher and TopicListener
examples to determine what level of throughput can be obtained.

My modified TopicPublisher spins up multiple connections, each connection
publishing to multiple topics. The messages published are BytesMessages that
simply have an array of 1000 random bytes. I use a
ScheduledExecutorService.scheduleAtFixedRate() to run tasks that are
triggered every 10 milliseconds. The tasks send a burst of messages. The
number of messages in the burst is computed to achieve a desired aggregate
bandwidth of data published, specified in Megabits per second.

I was very pleased to find that with the servers I have available for
testing (8-core 1.6Ghz Xeons with 8Gb RAM running CentOS 5.2) that I was
able to sustain about 500Mbps of physical data (i.e. including TCP header
and OpenWire overhead) from one publisher, through one broker, to one
listener, and run this test for hours without any problems. (For those used
to thinking in terms of messages per second, this is 50K messages/second
with 1K byte messages.) Even better, I can add a second listener, connecting
to the broker on a 2nd ethernet interface, such that the broker was
delivering a total of ~1Gbps of data to the two listeners. This is excellent
performance and gave me a great deal of confidence that we could use
ActiveMQ for our products.

However, I am now trying to write a listener using ActiveMQ-CPP 2.2.2, and
finding that it can't even come close to achieving the throughput that the
Java listener achieves. I started with the SimpleAsyncConsumer sample and
modified it to spin up multiple connections, with each connection
subscribing to a different topic (equivalent to my modified java
TopicListener). The only thing this application does is receive the messages
as fast as possible, and for each message use BytesMessage::getBodyLength()
to keep a running total of bytes received (again, equivalent to the java
listener).

So far, the C++ listener can only handle less than 1/4th of the volume of
data that the Java listener can handle. If I keep the data rate low enough,
the C++ listener seems to be able to run fine.  But when I push the data
rate up to 120Mbps, all three components (publisher, broker, listener)
freeze up in less than a half minute. The broker admin console shows greater
than 90% of the memory in use. Killing the listener and the publisher leaves
the broker in the same state, and so far the only solution I am aware of is
to kill and restart the broker.

I don't yet know if this is purely a "slow consumer" problem, or if the
consumer becomes "slow" because it deadlocks (I have a pstack output that
I'm going to study today and would be happy to make available). I suspect
the latter, since I haven't yet seen any indications of just "slow"
performance before the lockup happens (but I am not yet looking at advisory
messages, which I realize is a major oversight).

FYI, I am currently using the default configuration for the broker, but I do
the following at runtime to configure the pub/sub:

In the Java publisher:

   1. Sessions are created with AUTO_ACKNOWLEGE
   2. Delivery mody is NON_PERSISTENT
   3. Time to live is 10 seconds

In the Java TopicListener:

   1. Sessions are created with AUTO_ACKNOWLEGE
   2. Broker URI does not specify any parameters (i.e. do not specify
   jms.prefetchPolicy.all)
   3. Topic URIs do not specify any parameters (i.e. do not specify
   consumer.maximumPendingMessageLimit)

In the C++ Consumer:

   1. Sessions are created with AUTO_ACKNOWLEGE
   2. Broker URI includes "?jms.prefetchPolicy.all=2000"
   3. Topic URIs include "?consumer.maximumPendingMessageLimit=4000"

Note that while both the TopicListener and the SimpleAsyncConsumer used
asynchronous dispatch, I have modified both to do synchronous receives in
their own threads. For the C++ consumer, this results in 3 threads per
connection, and I have been testing with 8 connections. One experiment I
want to do today is revert to asynchronous dispatch, assuming this will
bring me back to 2 threads per connection.

I still have other investigation that I want to do, and it is possible that
this investigation will result in being able to provide enough specifics to
file a bug report. It's also possible that I'll find that I've made some
newbie mistake. However, in some of my research I've done so far I've seen
indications that ActiveMQ-CPP 2.2.1 had known problems similar to these, and
at least one known deadlock related to CmsTemplate still exists in the 2.2.2
release.

I am writing because I would appreciate help from AMQ developers or any
experienced users in the AMQ community who would be interested in checking
my work to rule out newbie mistakes. I would be happy to make the source
code for my modified examples available to anyone that is interested.

Some questions I would like to ask here: What is the right way to configure
publishers, brokers, and listeners for high volumes of messages when some
data loss is considered entirely acceptable? Suppose a system is allowed to
have only two nines (99.0%) SLA (measure monthly) for message delivery if
that is required to achieve high stability? Can the broker be configured
such that it will never deadlock even if a consumer deadlocks?

Thanks,
Jim Lloyd
Principal Architect
Silver Tail Systems Inc.

Reply via email to