Feel free to post or send any code that you would like reviewed in
regards to the modified CPP examples, I will take a look at let you know
if I see any obvious gotchas.  The deadlock that is currently known
seems to only crop up when using CmsTemplate and only at shutdown, so if
it is deadlocking on high volume its probably something new that we
haven't seen yet.  

Obviously if you can come up with some samples that can lock up the
client those would be invaluable in finding the root cause.  

Regards
Tim

On Thu, 2008-12-04 at 11:55 -0800, Jim Lloyd wrote:
> Hello,
> 
> I have experience with very high volume pub/sub using Tibco Rendezvous
> (multicast) for an internal monitoring & business analytics system that I
> led the development of at eBay. That system routinely had over 1Gbps of data
> in flight on the datacenter's GigE network, with dozens of blade servers
> publishing, and even more blade servers subscribing.
> 
> I'm now at a different company, and we're building products that will have a
> similar architecture, though likely more modest data volumes. We're using
> the ActiveMQ 5.2.0 release and ActiveMQ-CPP 2.2.2 release. I'm still coming
> up to speed on the ActiveMQ architecture, configuration, tools, etc. Over
> the last couple weeks I've modified the TopicPublisher and TopicListener
> examples to determine what level of throughput can be obtained.
> 
> My modified TopicPublisher spins up multiple connections, each connection
> publishing to multiple topics. The messages published are BytesMessages that
> simply have an array of 1000 random bytes. I use a
> ScheduledExecutorService.scheduleAtFixedRate() to run tasks that are
> triggered every 10 milliseconds. The tasks send a burst of messages. The
> number of messages in the burst is computed to achieve a desired aggregate
> bandwidth of data published, specified in Megabits per second.
> 
> I was very pleased to find that with the servers I have available for
> testing (8-core 1.6Ghz Xeons with 8Gb RAM running CentOS 5.2) that I was
> able to sustain about 500Mbps of physical data (i.e. including TCP header
> and OpenWire overhead) from one publisher, through one broker, to one
> listener, and run this test for hours without any problems. (For those used
> to thinking in terms of messages per second, this is 50K messages/second
> with 1K byte messages.) Even better, I can add a second listener, connecting
> to the broker on a 2nd ethernet interface, such that the broker was
> delivering a total of ~1Gbps of data to the two listeners. This is excellent
> performance and gave me a great deal of confidence that we could use
> ActiveMQ for our products.
> 
> However, I am now trying to write a listener using ActiveMQ-CPP 2.2.2, and
> finding that it can't even come close to achieving the throughput that the
> Java listener achieves. I started with the SimpleAsyncConsumer sample and
> modified it to spin up multiple connections, with each connection
> subscribing to a different topic (equivalent to my modified java
> TopicListener). The only thing this application does is receive the messages
> as fast as possible, and for each message use BytesMessage::getBodyLength()
> to keep a running total of bytes received (again, equivalent to the java
> listener).
> 
> So far, the C++ listener can only handle less than 1/4th of the volume of
> data that the Java listener can handle. If I keep the data rate low enough,
> the C++ listener seems to be able to run fine.  But when I push the data
> rate up to 120Mbps, all three components (publisher, broker, listener)
> freeze up in less than a half minute. The broker admin console shows greater
> than 90% of the memory in use. Killing the listener and the publisher leaves
> the broker in the same state, and so far the only solution I am aware of is
> to kill and restart the broker.
> 
> I don't yet know if this is purely a "slow consumer" problem, or if the
> consumer becomes "slow" because it deadlocks (I have a pstack output that
> I'm going to study today and would be happy to make available). I suspect
> the latter, since I haven't yet seen any indications of just "slow"
> performance before the lockup happens (but I am not yet looking at advisory
> messages, which I realize is a major oversight).
> 
> FYI, I am currently using the default configuration for the broker, but I do
> the following at runtime to configure the pub/sub:
> 
> In the Java publisher:
> 
>    1. Sessions are created with AUTO_ACKNOWLEGE
>    2. Delivery mody is NON_PERSISTENT
>    3. Time to live is 10 seconds
> 
> In the Java TopicListener:
> 
>    1. Sessions are created with AUTO_ACKNOWLEGE
>    2. Broker URI does not specify any parameters (i.e. do not specify
>    jms.prefetchPolicy.all)
>    3. Topic URIs do not specify any parameters (i.e. do not specify
>    consumer.maximumPendingMessageLimit)
> 
> In the C++ Consumer:
> 
>    1. Sessions are created with AUTO_ACKNOWLEGE
>    2. Broker URI includes "?jms.prefetchPolicy.all=2000"
>    3. Topic URIs include "?consumer.maximumPendingMessageLimit=4000"
> 
> Note that while both the TopicListener and the SimpleAsyncConsumer used
> asynchronous dispatch, I have modified both to do synchronous receives in
> their own threads. For the C++ consumer, this results in 3 threads per
> connection, and I have been testing with 8 connections. One experiment I
> want to do today is revert to asynchronous dispatch, assuming this will
> bring me back to 2 threads per connection.
> 
> I still have other investigation that I want to do, and it is possible that
> this investigation will result in being able to provide enough specifics to
> file a bug report. It's also possible that I'll find that I've made some
> newbie mistake. However, in some of my research I've done so far I've seen
> indications that ActiveMQ-CPP 2.2.1 had known problems similar to these, and
> at least one known deadlock related to CmsTemplate still exists in the 2.2.2
> release.
> 
> I am writing because I would appreciate help from AMQ developers or any
> experienced users in the AMQ community who would be interested in checking
> my work to rule out newbie mistakes. I would be happy to make the source
> code for my modified examples available to anyone that is interested.
> 
> Some questions I would like to ask here: What is the right way to configure
> publishers, brokers, and listeners for high volumes of messages when some
> data loss is considered entirely acceptable? Suppose a system is allowed to
> have only two nines (99.0%) SLA (measure monthly) for message delivery if
> that is required to achieve high stability? Can the broker be configured
> such that it will never deadlock even if a consumer deadlocks?
> 
> Thanks,
> Jim Lloyd
> Principal Architect
> Silver Tail Systems Inc.

Reply via email to