Re: DLQ, cause:null

James Green Mon, 27 Apr 2015 06:32:20 -0700

See in-line.

On 26 April 2015 at 05:37, Tim Bain <tb...@alumni.duke.edu> wrote:


> James,
>
> The prefetch buffer is a buffer of messages in the ActiveMQ code in the
> client process that holds messages that have been dispatched from the
> broker to the client but that haven't yet been handed over to the client.
>

To clarify the location of this pre-fetch buffer, is it inside the broker
itself? Allocated to the client that is connected but not to individual
consumers?



> The purpose is to keep some number of messages in memory and available for
> the consumer to handle, to ensure that the consumer never has to wait for a
> message to be pulled from the broker.  This lets the consumer consume as
> quickly as possible.  The broker will continue dispatching messages to the
> client until the client has a full prefetch buffer, after which point it
> will dispatch one message for every ack it gets back from the consumer.
>

Where "client" is in fact inside the broker at the network edge listening
to the socket where the client with consumers is connected.


> When you have multiple concurrent consumers, the broker will round-robin
> messages between consumer that don't have full prefetch buffers; it won't
> batch up a full prefetch buffer's worth for Consumer 1 before sending any
> messages to Consumers 2-4, which is what I think you were worried about.
>

I was more conceptually thinking of a pre-fetch buffer being inside the
client that is connected to the broker - i.e. the messages have already
gone across the network and are therefore instant delivery into consumers.
But I think when you talk of "client" you actually mean a resource within
the broker marshalling messages out to a session within which multiple
consumers are present.


>
> If there are a non-zero number of messages in the client's prefetch buffer
> and receive() is called, the thread should simply go grab the first message
> in the buffer and return it, so the timeout should not elapse unless the
> broker's host is HEAVILY loaded or locking somehow delays thread execution
> or you get unlucky enough to catch a full GC right then.  If there are no
> messages in the buffer, the thread should wait for the timeout period to
> see if a message shows up, and either return that message when it does or
> return with no message after the timeout elapses.  In both scenarios, I
> would expect that the consumer would remain connected and so no redelivery
> would apply; receive() should just be looking to see whether a message came
> across the pre-existing connection, but it should not be making connections
> nor disconnecting if the timeout interval elapses without a message.  (If
> you're disconnecting after each message, that's an anti-pattern as I
> understand it, and you should probably rethink your approach.)
>

I think our machine itself gets busy. We know that connecting and sending
to ActiveMQ via STOMP sometimes ends up with a socket wait over 10 seconds.
Identifying why could be interesting.


> All of that is to say, I don't think that the elapsing of a receive()
> timeout without receiving a message should do anything that would cause a
> message redelivery, so I wonder if that's a red herring and the problem is
> actually something else.  Do you see any messages in your client or broker
> logs indicating that the client disconnected and reconnected, that the
> connection's inactivity monitor detected the connection to be inactive, or
> that the consumer was aborted as a slow consumer?
>

I have not received any indications that the client is reconnecting.

We have updated the receive time-out from 1000ms to 10000ms retaining the
default 6 delivery attempts and so far nothing new has appeared in the DLQ.


>
> BTW, for your update of that page on the wiki, what client connection
> timeout did you have in mind?  (I can think of at least three things that
> match that phrase: timeout on establishing the initial connection to the
> broker, inactivity on the connection that leads to a disconnect, and the
> receive() timeout you referenced above.)  I think that a disconnect due to
> connection inactivity where the inactivity monitor was in use would indeed
> produce a redelivery, but if you meant the elapsing of a receive() timeout
> as described above, I'm not convinced that that's accurate (and if it turns
> out not to be, we should pull that edit back off the wiki page to avoid
> confusing people).  But one thing that I believe is missing from that list
> is when a consumer disconnects from the broker (for any reason) where
> messages have been dispatched to the consumer but not acknowledged (i.e.
> they're in the consumer's prefetch buffer or they're the message the
> consumer was processing at the time of the disconnect under certain
> acknowledgement modes).
>

Surely if a client takes more than the receive time-out to process a
message, a re-delivery will occur? If not, what does happen?

That was the documentation update I had intended under what I thought was a
very safe interpretation so far.



>
> Tim
>
> On Fri, Apr 24, 2015 at 8:10 AM, James Green <james.mk.gr...@gmail.com>
> wrote:
>
> > I'm need to understand pre-fetch limit and receive time-out interaction.
> >
> > We have four concurrent consumers in our route. Do each receive the
> > messages in batches of the pre-fetch limit?
> >
> > At what point does the receive time-out start and end?
> >
> > In our case each client performs a number of db queries then fires a new
> > message at the broker before the route is complete. Typically this may
> take
> > more than 1 second under load. A 10s time-out only makes sense if the
> > pre-fetching is not included but then that suggests client-calculated
> > time-outs communicated back to the broker which also makes no sense.
> >
> > So, am clear we need to better understand what's under the hood here!
> >
> > James
> >
> > On 24 April 2015 at 14:39, Gary Tully <gary.tu...@gmail.com> wrote:
> >
> > > Tim, steady, I suggested it *may* be relevant :-)
> > > With camel and transactions - ie: spring dmlc, connection pools and
> > > cache levels - anything is possible w.r.t consumer/sessions/connection
> > > state, because there are so many variables in the mix.
> > >
> > > With activemq and prefetch, every consumer disconnect will result in
> > > redeliveries. The trick is figuring out whether
> > > the prefetched messages were actually delivered to the consumer so the
> > > delivery count can reflect the applications view
> > > of the world, that is not an exact science.
> > >
> > >
> > > On 24 April 2015 at 13:51, Tim Bain <tb...@alumni.duke.edu> wrote:
> > > > Gary,
> > > >
> > > > If I understood that JIRA correctly, the bug only occurs when the
> > client
> > > > disconnects, which doesn't sound like what James is doing (nothing in
> > his
> > > > description indicated to me that his client wasn't staying up and
> > > connected
> > > > the whole time), so it doesn't sound like your fix would resolve (nor
> > > > explain) his problem.  And although I'm all about workarounds when I
> > know
> > > > there's a fix in a future version, I'm not sure that's the case here
> > and
> > > I
> > > > don't want to give him a workaround at the expense of actually
> finding
> > > and
> > > > fixing a bug.
> > > >
> > > > The two things I know of that can cause message redelivery are 1)
> > client
> > > > disconnection with queues and durable topic subscriptions and 2)
> > > unhandled
> > > > exceptions in the client message handler code.  James, might #2 be
> > going
> > > on
> > > > here?  And Gary (or anyone else), are there any other possible causes
> > of
> > > > redelivery that I don't know about?
> > > >
> > > > Tim
> > > > On Apr 24, 2015 4:59 AM, "Gary Tully" <gary.tu...@gmail.com> wrote:
> > > >
> > > >> to avoid the redelivered messages getting sent to the DLQ, changing
> > > >> the default redelivery policy max from 6 to infinite will help.
> > > >>
> > > >> You can do this in the brokerurl passed to the jms connection
> factory,
> > > >> it may also make sense to reduce the prefetch if consumers come and
> go
> > > >> without consuming the prefetch, which seems to be the case.
> > > >>
> > > >>
> > > >>
> > >
> >
> tcp://..:61616?jms.prefetchPolicy.all=100&jms.redeliveryPolicy.maximumRedeliveries=-1
> > > >>
> > > >> On 23 April 2015 at 17:14, James Green <james.mk.gr...@gmail.com>
> > > wrote:
> > > >> > Hi,
> > > >> >
> > > >> > We are not overriding so the defaults of 1s timeout on the
> receive()
> > > and
> > > >> > 1,000 prefetch are in play.
> > > >> >
> > > >> > We are updating the connection URI to set a much higher timeout.
> > > >> >
> > > >> > Interestingly, PHP sending to the very same broker via STOMP gets
> > > send()
> > > >> > fail with a 2 second timeout specified. With a 10 second timeout
> the
> > > >> > frequency of this is reduced.
> > > >> >
> > > >> > I have fired up the latest hawt.io jar and connected to this
> > broker,
> > > >> > however the Health and Threads parts are entirely blank. The
> queues
> > > are
> > > >> all
> > > >> > visible yet "browse" of ActiveMQ.DLQ shows none of the 3,000+
> > > accumulated
> > > >> > messages. Wondering where to go next?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > James
> > > >> >
> > > >> >
> > > >> > On 23 April 2015 at 13:35, Gary Tully <gary.tu...@gmail.com>
> wrote:
> > > >> >
> > > >> >> what sort of timeout is on the receive(...) from spring dmlc, and
> > > what
> > > >> >> is the prefetch for that consumer. It appears that the message is
> > > >> >> getting dispatched but not consumed, the connection/consumer dies
> > and
> > > >> >> the message is flagged as a redelivery. then the before delivery
> > > check
> > > >> >> on the delivery counter kicks the message to the dlq. So this
> must
> > be
> > > >> >> happening 6 times.
> > > >> >>
> > > >> >> I just pushed a tidy up of some of the redelivery semantics -
> there
> > > >> >> was a bug there that would cause the redelivery counter to
> > increment
> > > >> >> in error... so that may be relevant[1].
> > > >> >> A short term solution would be to ensure infinite or a very large
> > > >> >> number of redeliveries, up from the default 6. That can be
> provided
> > > in
> > > >> >> the broker url.
> > > >> >>
> > > >> >> [1] https://issues.apache.org/jira/browse/AMQ-5735
> > > >> >>
> > > >> >> On 23 April 2015 at 13:08, James Green <james.mk.gr...@gmail.com
> >
> > > >> wrote:
> > > >> >> > We have a camel route consuming from ActiveMQ (5.10.0 with
> > KahaDB)
> > > and
> > > >> >> > frequently get a DLQ entry without anything logged through our
> > > >> >> errorHandler.
> > > >> >> >
> > > >> >> > The only thing we have to go on is a dlqFailureCause header
> which
> > > >> says:
> > > >> >> >
> > > >> >> > java.lang.Throwable: Exceeded redelivery policy
> > > limit:RedeliveryPolicy
> > > >> >> > {destination = null, collisionAvoidanceFactor = 0.15,
> > > >> >> maximumRedeliveries =
> > > >> >> > 6, maximumRedeliveryDelay = -1, initialRedeliveryDelay = 1000,
> > > >> >> > useCollisionAvoidance = false, useExponentialBackOff = false,
> > > >> >> > backOffMultiplier = 5.0, redeliveryDelay = 1000}, cause:null
> > > >> >> >
> > > >> >> > These are happening apparently at random. The route is marked
> > > >> transacted,
> > > >> >> > and is backed by Spring Transactions itself backed by Narayana.
> > > >> >> >
> > > >> >> > Our debugging indicates that our route never receives the
> message
> > > from
> > > >> >> AMQ
> > > >> >> > prior to it hitting the DLQ. We have switched on DEBUG logging
> > for
> > > >> >> > org.apache.activemq but other than being swamped with even more
> > > logs
> > > >> >> we've
> > > >> >> > observed nothing notable.
> > > >> >> >
> > > >> >> > Any ideas where to go from here? Impossible to say which of the
> > > >> several
> > > >> >> > thousand messages per day will go this way so an attached
> > debugger
> > > is
> > > >> out
> > > >> >> > of the question.
> > > >> >> >
> > > >> >> > Our log4j config fragment:
> > > >> >> >
> > > >> >> >         <Logger name="com" level="WARN"/>
> > > >> >> >         <Logger name="org" level="WARN"/>
> > > >> >> >         <Logger name="org.apache.camel" level="DEBUG"/>
> > > >> >> >         <Logger name="org.apache.activemq" level="DEBUG"/>
> > > >> >> >         <Logger name="org.springframework.orm.jpa"
> > level="DEBUG"/>
> > > >> >> >         <Logger name="org.springframework.transaction"
> > > level="DEBUG"/>
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> >
> > > >> >> > James
> > > >> >>
> > > >>
> > >
> >
>

Re: DLQ, cause:null

Reply via email to