Ok, so to check I understand correctly, and seek clarification on some points...
You have potentially 30 application instances that have 5 connections, 20 sessions per connection, and are each creating 2 consumers on all 6000 priority queues (using 600 consumers per session), thus giving up to 150 (30x5) connections, 3000 (30x5x20) sessions, and 360000 (30x2x6000) consumers? The consumers would only require 600 (360000/600) sessions, so can I assume the other 2400 sessions would be used for publishers, or have I misinterpreted something? (I am unclear on the '20-30' vs '15') How are the sessions for the consumers spread across the connections: all on 1 connection, 4 on each of the 5 connections, something else? Although you are ultimately looking to increase performance by batching, it is actually more the application processing steps you are looking to speed up by supplying more data at once rather than explicitly decreasing the actual messaging overhead (which if bounding performance due to round trips to the broker, can mean larger batches increasing message throughput). Although you would like processing across the queues to be fair, you dont actually have any explicit ordering requirements such as 'after processing messages from Queue X we must process Queue Foo'. If each queue currently has up to 60 (30x2) consumers competing for the messages, does this mean you have no real ordering requirements (discounting priorities) when processing the messages on each queue, i.e it doesn't matter which application instances get a particular message, and say particular consumers could get and process the first and third messages whilst a slower consumer actually got and then later finished processing the second message? I ask because if you try to batch the messages on queues with multiple consumers and no prefetch (or even with prefetch) it isn't likely you would find consumers getting a sequential batch-sized group of messages (without introducing message grouping to the mix, that is) but rather instead get a message followed by other messages with one or more intermediate 'gaps' where competing consumers received those messages. Is that acceptable to whatever batched processing it is you are likely to be doing? You mentioned possibly only 100 queues servicing batch messages. Did you mean that you could know/decide in advance which those queues are, i.e they are readily identifiable in advance, or could it just be any 100 queues based on some condition at a given point in time? Robbie On 16 July 2012 16:54, Praveen M <[email protected]> wrote: > Hi Robbie. Thank you for writing back. Please see inline for answers to > some of the questions you had. > > On Mon, Jul 16, 2012 at 4:40 AM, Robbie Gemmell <[email protected] > >wrote: > > > Hi Praveen, > > > > I have talked this over with some of the others here, and tend to agree > > with Gordon and Rajith that mixing asynchronous and synchronous consumers > > in that fashion isn't a route I would really suggest; using two sessions > > makes for complication around transactionality and ordering, and I dont > > think it will work on a single session. > > > > We do have some ideas you could potentially use to implement batching in > > the application to improve performance, but there are various subtleties > to > > consider that might heavily influence our suggestions. As such we really > > need a good bit more detail around the use case to actually give a > reasoned > > answer. For example: > > > > - How many connections/sessions/consumers/queues are actually in use? > > > > In our current system, we have 20-30 client servers talking to our Qpid > messaging server. > We have 5 connections, 20 sessions/connection, 2 consumers/queue from a > single client server's standpoint.(so all the numbers should be multiplied > by a max factor of 30, since we could have upto 30 client servers). > We create overall 6000 queues in our Qpid messaging server. > > > > - Are there multiple consumers on each/any of the queues at the same > time? > > > Yes. To explain this a little bit, > > We have about 15 client servers, consuming messages. > we have 20 sessions(threads) consuming messages per client server. We have > broken the 6000 queues into 10 buckets, and have 2 sessions(threads) > listening/consuming on every 600 queues. Hence, an individual session might > try to listen and consume from 600 queues max on the same thread. > > > - What if any ordering requirements are there on the message processing > > (either within each queue or across all the queues)? > > > Across all queues, we'd like to process in a round-robin fashion to ensure > fairness across the queues. We achieve this now by turning off prefecting > (we're using prefetch 1, which works well). > Within the queue, all our queues are priority queues, so we process based > upon priority order. > > > > - What is the typical variation of message volumes across the queues that > > you are looking to balance? > > volumes vary quite a bit between queues(based upon the service the queue is > tied to). Some queues, have relatively low traffic, some have bursty, and > some have consistent high, and some with > slow consumers. > Our numbers are at a high of a million per day for a busy queue. > > > > - What are the typical message sizes? > > > Message sizes are typically arond 1KB-2KB > > > > - How many messages might you potentially be looking to batch? > > > The batch sizes are typically provided from our client applications, and > typically it's in the order of 10-50 > > > > - What is the typical processing time in onMessage() now? Would this vary > > as a direct multipe of the number of messages batched, or by some other > > scaling? > > > The onMessage() callback invokes an application service, so I can't say > exactly...but with the effect of batching the processing time is typically > quite less than the direct multiple of the number of messages batched. > > Most typical use case for us, where messages are batched helps is, when a > database query is invoked with the batched messages thus performing a bulk > operation. This can be very expensive for us, if we do this in a one-by-one > order instead of batching the database query. > Also, typically batch message traffic is bursty, and our processing times > are quite high. From our current data, even though we have a multiple > consumer setup, batching helps us process efficiently for applications > which process messages in bulk. > > Also, out of all our queues. I would say, only about a 100 of them would be > servicing batch messages. > > Our current messaging infrastructure supports batch messages, and hence we > have a lot of dependent code written which expects batching. Getting out of > it now, might be quite tough at this point, hence I'd like to implement a > pseudo batch on top of Qpid. My original thought was around using 2 > sessions, onMessage() and a synchronous consumer. I don't think we have > much concern with transactionality as we have our own reference to each > message in our database to guarantee transactionality. > > Do let me know what you think, and I'd love to hear if you can think of > alternate approaches to this problem. > > Hope to hear from you soon. > > Thanks, > Praveen > > Regards, > > Robbie > > > > On 12 July 2012 17:53, Praveen M <[email protected]> wrote: > > > > > Hi, > > > > > > I'm trying to explore if there are ways to batch message processing. > > > Batching message processing would help us improve performance for some > of > > > our use cases, > > > where we could chunk messages and process them in a single callback. > > > > > > Have anyone here explored building a layer to batch messages. > > > > > > I am using the Java Broker and the Java client. > > > > > > I would like to stick to the JMS api as much as possible. > > > > > > This is what I currently have, still wondering if it'd work. > > > > > > 1) When the onMessage() callback is triggered, create a consumer a pull > > > more messages to process from the queue where the message was delivered > > > from. > > > 2) Pull messages upto the number of my max chunk size, or upto the > > messages > > > available in the queue. > > > 3) process all the messages together and commit on the session. > > > > > > I'd like to hear ideas on how to go about this. > > > > > > Thanks, > > > -- > > > -Praveen > > > > > > > > > -- > -Praveen >
