Re: Batch/Bulk receive messages using java client?

Robbie Gemmell Thu, 19 Jul 2012 08:30:08 -0700

I have never tried it, but the only caveats I know of are the ones we
discussed already, issues around things like ordering and transactionality
(eg one commit succeeds and the other fails).


Robbie

On 18 July 2012 19:15, Praveen M <[email protected]> wrote:

> Hi Robbie,
>
> Thanks to you and team Qpid for writing and all the thoughts. Helps
> immensely.
>
> So, I understand that the second synchronous consumer idea basically is a
> little unclean,
> because of the second consumer invocation from the onMessage() callback.
> Are there
> any hidden caveats around this, that I should be aware of ??
>
>  I'm going to try and implement
> that approach for a start and run through some tests to see if it works as
> desired.
>
> Rob's idea is interesting, I didn't think on those lines at all..however,
> yea it won't work since we're
> multiple consumer. Nice thought though.
>
> I'll keep you posted on how it goes :)
>
> Thanks,
> Praveen
>
> On Wed, Jul 18, 2012 at 10:08 AM, Robbie Gemmell
> <[email protected]>wrote:
>
> > Hi Praveen,
> >
> > So, as it turns out, after talking over the specifics of your use case
> > further it doesn't seem like any of the things we considered will work
> for
> > you, so we don't really have anything better left to suggest than the
> > second synchronous consumer you proposed. Although we don't especially
> like
> > it, your use case does at least seem to be one that shouldn't fall foul
> of
> > some of the inherant limitatations of doing that.
> >
> > (In case you are interested, the most promising idea was one Rob had
> > suggested involving doing some things with queue bindings and an LVQ to
> > implement a kind of control queue which could be used implement
> triggering
> > of batched synchronous consumption on the original payload queues.
> > Unfortuantely, this wont really work with the multiple consumers you have
> > in place since they wont necessarily want to consume all of the messages
> on
> > a given queue at once for fairness and it would then become necessary to
> > somehow signal further processing was required by potentially another
> > consumer. Equally, removing the conflation on the control queue to
> > compensate for the multiple consumers would just lead to a situation
> where
> > you would invariably end up triggering activity against a queue that one
> or
> > more other consumers had already drained and so this wouldn't be
> > particularly efficient.)
> >
> > As an aside, we were quite impressed by the number of consumers you are
> > using, its just a smidge (up to 2 orders of magnitude) more than most of
> > our users typically have :)
> >
> > Robbie
> >
> > On 17 July 2012 15:05, Praveen M <[email protected]> wrote:
> >
> > > Hi Robbie,
> > >
> > > Thanks for writing back soon. Please see inline.
> > >
> > > On Mon, Jul 16, 2012 at 3:32 PM, Robbie Gemmell <
> > [email protected]
> > > >wrote:
> > >
> > > > Ok, so to check I understand correctly, and seek clarification on
> some
> > > > points...
> > > >
> > > > You have potentially 30 application instances that have 5
> connections,
> > 20
> > > > sessions per connection, and are each creating 2 consumers on all
> 6000
> > > > priority queues (using 600 consumers per session), thus giving up to
> > 150
> > > > (30x5) connections, 3000 (30x5x20) sessions, and 360000 (30x2x6000)
> > > > consumers?
> > > >
> > > > yes, that is correct.
> > >
> > >
> > > > The consumers would only require 600 (360000/600) sessions, so can I
> > > assume
> > > > the other 2400 sessions would be used for publishers, or have I
> > > > misinterpreted something? (I am unclear on the '20-30' vs '15')
> > > >
> > > > Yes. You are correct again. However, i forgot to tell you that we
> have
> > > dedicated connections for consumers(2 connections) vs publishers(5
> > > connections). Thus it'd be 600 sessions for consumers and 3000 sessions
> > for
> > > publishers.
> > >
> > >
> > > > How are the sessions for the consumers spread across the connections:
> > all
> > > > on 1 connection, 4 on each of the 5 connections, something else?
> > > >
> > >
> > > I have 2 connections dedicated to consumers (publishers won't use these
> > > connections. I try to isolate publisher from consumer connections.).
> The
> > 5
> > > connections i mentioned above are used only by publishers. (sorry for
> > being
> > > not very clear earlier).
> > >
> > > Since we have 2 connections for consumers, it's 10 consumer
> > > sessions/connection/server
> > >
> > >
> > > > Although you are ultimately looking to increase performance by
> > batching,
> > > it
> > > > is actually more the application processing steps you are looking to
> > > speed
> > > > up by supplying more data at once rather than explicitly decreasing
> the
> > > > actual messaging overhead (which if bounding performance due to round
> > > trips
> > > > to the broker, can mean larger batches increasing message
> throughput).
> > > >
> > > > Yes that is correct.
> > >
> > >
> > > > Although you would like processing across the queues to be fair, you
> > dont
> > > > actually have any explicit ordering requirements such as 'after
> > > processing
> > > > messages from Queue X we must process Queue Foo'.
> > > >
> > > > Yes. There is no such ordering requirements.
> > >
> > >
> > > > If each queue currently has up to 60 (30x2) consumers competing for
> the
> > > > messages, does this mean you have no real ordering requirements
> > > > (discounting priorities) when processing the messages on each queue,
> > i.e
> > > it
> > > > doesn't matter which application instances get a particular message,
> > and
> > > > say particular consumers could get and process the first and third
> > > messages
> > > > whilst a slower consumer actually got and then later finished
> > processing
> > > > the second message? I ask because if you try to batch the messages on
> > > > queues with multiple consumers and no prefetch (or even with
> prefetch)
> > it
> > > > isn't likely you would find consumers getting a sequential
> batch-sized
> > > > group of messages (without introducing message grouping to the mix,
> > that
> > > > is) but rather instead get a message followed by other messages with
> > one
> > > or
> > > > more intermediate 'gaps' where competing consumers received those
> > > messages.
> > > > Is that acceptable to whatever batched processing it is you are
> likely
> > to
> > > > be doing?
> > > >
> > > > yes. we do not have any ordering requirement. Yes we're ok with
> exactly
> > > what you describe. Each message is independent of the other, and we do
> > not
> > > process messages in a workflow order anyway. We do not use any message
> > > grouping (and do not plan to), and gaps are ok.
> > >
> > >
> > > > You mentioned possibly only 100 queues servicing batch messages. Did
> > you
> > > > mean that you could know/decide in advance which those queues are,
> i.e
> > > they
> > > > are readily identifiable in advance, or could it just be any 100
> queues
> > > > based on some condition at a given point in time?
> > > >
> > > > Yes. we could decide in advance and identify batch queues if
> required.
> > >
> > > Thanks Robbie.
> > >
> > >
> > > > Robbie
> > > >
> > > > On 16 July 2012 16:54, Praveen M <[email protected]> wrote:
> > > >
> > > > > Hi Robbie. Thank you for writing back. Please see inline for
> answers
> > to
> > > > > some of the questions you had.
> > > > >
> > > > > On Mon, Jul 16, 2012 at 4:40 AM, Robbie Gemmell <
> > > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > Hi Praveen,
> > > > > >
> > > > > > I have talked this over with some of the others here, and tend to
> > > agree
> > > > > > with Gordon and Rajith that mixing asynchronous and synchronous
> > > > consumers
> > > > > > in that fashion isn't a route I would really suggest; using two
> > > > sessions
> > > > > > makes for complication around transactionality and ordering, and
> I
> > > dont
> > > > > > think it will work on a single session.
> > > > > >
> > > > > > We do have some ideas you could potentially use to implement
> > batching
> > > > in
> > > > > > the application to improve performance, but there are various
> > > > subtleties
> > > > > to
> > > > > > consider that might heavily influence our suggestions. As such we
> > > > really
> > > > > > need a good bit more detail around the use case to actually give
> a
> > > > > reasoned
> > > > > > answer. For example:
> > > > > >
> > > > > > - How many connections/sessions/consumers/queues are actually in
> > use?
> > > > > >
> > > > >
> > > > > In our current system, we have 20-30 client servers talking to our
> > Qpid
> > > > > messaging server.
> > > > > We have 5 connections, 20 sessions/connection, 2 consumers/queue
> > from a
> > > > > single client server's standpoint.(so all the numbers should be
> > > > multiplied
> > > > > by a max factor of 30, since we could have upto 30 client servers).
> > > > > We create overall 6000 queues in our Qpid messaging server.
> > > > >
> > > > >
> > > > > > - Are there multiple consumers on each/any of the queues at the
> > same
> > > > > time?
> > > > > >
> > > > > Yes. To explain this a little bit,
> > > > >
> > > > > We have about 15 client servers, consuming messages.
> > > > > we have 20 sessions(threads) consuming messages per client server.
> We
> > > > have
> > > > > broken the 6000 queues into 10 buckets, and have 2
> sessions(threads)
> > > > > listening/consuming on every 600 queues. Hence, an individual
> session
> > > > might
> > > > > try to listen and consume from 600 queues max on the same thread.
> > > > >
> > > > >
> > > > > - What if any ordering requirements are there on the message
> > processing
> > > > > > (either within each queue or across all the queues)?
> > > > > >
> > > > > Across all queues, we'd like to process in a round-robin fashion to
> > > > ensure
> > > > > fairness across the queues. We achieve this now by turning off
> > > prefecting
> > > > > (we're using prefetch 1, which works well).
> > > > > Within the queue, all our queues are priority queues, so we process
> > > based
> > > > > upon priority order.
> > > > >
> > > > >
> > > > > > - What is the typical variation of message volumes across the
> > queues
> > > > that
> > > > > > you are looking to balance?
> > > > >
> > > > > volumes vary quite a bit between queues(based upon the service the
> > > queue
> > > > is
> > > > > tied to). Some queues, have relatively low traffic, some have
> bursty,
> > > and
> > > > > some have consistent high, and some with
> > > > > slow consumers.
> > > > > Our numbers are at a high of a million per day for a busy queue.
> > > > >
> > > > >
> > > > > > - What are the typical message sizes?
> > > > > >
> > > > > Message sizes are typically arond 1KB-2KB
> > > > >
> > > > >
> > > > > > - How many messages might you potentially be looking to batch?
> > > > > >
> > > > > The batch sizes are typically provided from our client
> applications,
> > > and
> > > > > typically it's in the order of 10-50
> > > > >
> > > > >
> > > > > > - What is the typical processing time in onMessage() now? Would
> > this
> > > > vary
> > > > > > as a direct multipe of the number of messages batched, or by some
> > > other
> > > > > > scaling?
> > > > >
> > > > >
> > > > > The onMessage() callback invokes an application service, so I can't
> > say
> > > > > exactly...but with the effect of batching the processing time is
> > > > typically
> > > > > quite less than the direct multiple of the number of messages
> > batched.
> > > > >
> > > > > Most typical use case for us, where messages are batched helps is,
> > > when a
> > > > > database query is invoked with the batched messages thus
> performing a
> > > > bulk
> > > > > operation. This can be very expensive for us, if we do this in a
> > > > one-by-one
> > > > > order instead of batching the database query.
> > > > > Also, typically batch message traffic is bursty, and our processing
> > > times
> > > > > are quite high. From our current data, even though we have a
> multiple
> > > > > consumer setup, batching helps us process efficiently for
> > applications
> > > > > which process messages in bulk.
> > > > >
> > > > > Also, out of all our queues. I would say, only about a 100 of them
> > > would
> > > > be
> > > > > servicing batch messages.
> > > > >
> > > > > Our current messaging infrastructure supports batch messages, and
> > hence
> > > > we
> > > > > have a lot of dependent code written which expects batching.
> Getting
> > > out
> > > > of
> > > > > it now, might be quite tough at this point, hence I'd like to
> > > implement a
> > > > > pseudo batch on top of Qpid. My original thought was around using 2
> > > > > sessions, onMessage() and a synchronous consumer. I don't think we
> > have
> > > > > much concern with transactionality as we have our own reference to
> > each
> > > > > message in our database to guarantee transactionality.
> > > > >
> > > > > Do let me know what you think, and I'd love to hear if you can
> think
> > of
> > > > > alternate approaches to this problem.
> > > > >
> > > > > Hope to hear from you soon.
> > > > >
> > > > > Thanks,
> > > > > Praveen
> > > > >
> > > > > Regards,
> > > > > > Robbie
> > > > > >
> > > > > > On 12 July 2012 17:53, Praveen M <[email protected]>
> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm trying to explore if there are ways to batch message
> > > processing.
> > > > > > > Batching message processing would help us improve performance
> for
> > > > some
> > > > > of
> > > > > > > our use cases,
> > > > > > > where we could chunk messages and process them in a single
> > > callback.
> > > > > > >
> > > > > > > Have anyone here explored building a layer to batch messages.
> > > > > > >
> > > > > > > I am using the Java Broker and the Java client.
> > > > > > >
> > > > > > > I would like to stick to the JMS api as much as possible.
> > > > > > >
> > > > > > > This is what I currently have, still wondering if it'd work.
> > > > > > >
> > > > > > > 1) When the onMessage() callback is triggered, create a
> consumer
> > a
> > > > pull
> > > > > > > more messages to process from the queue where the message was
> > > > delivered
> > > > > > > from.
> > > > > > > 2) Pull messages upto the number of my max chunk size, or upto
> > the
> > > > > > messages
> > > > > > > available in the queue.
> > > > > > > 3) process all the messages together and commit on the session.
> > > > > > >
> > > > > > > I'd like to hear ideas on how to go about this.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > --
> > > > > > > -Praveen
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Praveen
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Praveen
> > >
> >
>
>
>
> --
> -Praveen
>

Re: Batch/Bulk receive messages using java client?

Reply via email to