Thanks for the detailed response. I generally agree it's not flawed and it
is likely my configuration, I'm trying to take steps to track down the
cause, but of course it is behaving now.

I do think some defensiveness to protect it from shutting down regardless
of configuration would be a good idea. Is there some system in place that
would allow me to be alerted of issues that could be catastrophic before
they happen?

On 21 October 2017 at 14:08, Tim Bain <tb...@alumni.duke.edu> wrote:

> Responses inline.
>
> On Fri, Oct 20, 2017 at 5:46 AM, Lionel van den Berg <lion...@gmail.com>
> wrote:
>
> > Hi, thanks for the response.
> >
> > Some questions on these points from the troubleshooting.
> >
> >
> >    1. *It contains a pending message for a destination or durable topic
> >    subscription*
> >
> > This seems a little flawed, if a consumer who I have little control of is
> > mis-behaving then my ActiveMQ can end up shutting down and unrecoverable.
> > Is there some way of timing this out or similar?
> >
>
> There are multiple ways of discarding messages that are not being consumed,
> which are detailed at http://activemq.apache.org/
> slow-consumer-handling.html
> (several of which it sounds like you're already using). Keep in mind that
> unconsumed DLQ messages are unconsumed messages, so you'll want to make
> sure you address those messages as well;
> http://activemq.apache.org/message-redelivery-and-dlq-handling.html
> contains additional information about handling messages in the context of
> the DLQ. And no, I wouldn't say it's flawed, it just means you have to do
> some configuration work that you haven't yet done.
>
>
> > *2. It contains an ack for a message which is in an in-use data file -
> the
> > ack cannot be removed as a recovery would then mark the message for
> > redelivery*
> >
> > Same comment as 1.
> >
>
> Same response as for #1. There's one additional wrinkle (KahaDB keeps an
> entire data file alive because of a single live message, which in turn
> means you have to keep the acks for the later messages which are in later
> data files), but that's been partially mitigated by the addition of the
> ability to compact acks by replaying them into the current data file, which
> should allow any data file that contains no live non-ack messages to be
> GC'ed. So there's a small portion of this that's purely the result of
> KahaDB's design as a non-compacting data store, but it's a problem only
> when there's an old unacknowledged message, which takes us back to #1.
>
>
> > *3. The journal references a pending transaction*
> >
> > I'm not using transactions, but are there transactions under the hood?
> >
>
> No, this would only apply if you were directly using transactions, so this
> doesn't apply to you.
>
>
> > *4. It is a journal file, and there may be a pending write to it*
> >
> > Why would this be the case?
> >
>
> If we haven't finished flushing the file, using a buffer-then-flush
> paradigm. This will be an infrequent situation, and should only be a small
> number of data files, so if you're having a problem with the number of
> files kept, it's not because of this. It's just included in the list for
> completeness.
>
> I'll see if I can change the logging settings, since the first occurrence
> > the number of log files does not seem to have been an issue. I have it
> > configured to keep messages for 7 days so regardless of the above
> > conditions I would have thought that at that expiry the log would be
> > cleaned up so we don't end up in such a situation where the system stops
> > and cannot restart.
> >
>
> If you are indeed configured as you describe, I would think that log
> cleanup would indeed happen as you expect, which means that either there's
> an undiscovered bug in our code or you're not configured the way you think
> you are.
>
> The page I linked to originally has instructions for how to determine which
> destinations have messages that are preventing the KahaDB data files from
> being deleted, which might let you investigate further (for example, by
> looking at the messages and their attributes to see if timestamps are being
> set correctly).
>
> Tim
>

Reply via email to