Re: Slow consume issue (was: Re: AMQ142007: Can not find record during compact replay)

Justin Bertram Wed, 05 Oct 2022 09:06:52 -0700

> Is there a known issue with performance of selectors on 2.24.0 ? In
particular that may degrade over time.


I'm not aware of any particular issue, per se, but generally speaking,
using selectors on queue consumers is not good for performance. This is due
to the queue scanning required to match messages to the selector.
Performance won't degrade over time necessarily, but it will degrade as the
complexity of the selector increases or as the number of messages increases
or as the number of consumers increases (or any combination thereof).

Couple of questions:
 - Are selectors in use with the same volume of messages and consumers on
the queues where you're not seeing any problems?
 - Have you gathered any thread dumps to see what the broker is doing
during these slow downs? If so, what did you find? If not, could you gather
a handful of thread dumps so we can know what's happening over time?
 - Have you gathered any metrics related to CPU utilization during these
slow downs? If so, what did you find?
 - Are there any "gaps" in the consumers' selectors such that some messages
may not actually match any selector and therefore will not be consumed?
 - How complex are your selectors?

Generally speaking, performance will be better with *topic* subscribers
using selectors. This is because the broker pre-sorts matching messages
into the corresponding subscription as they arrive at the topic which means
that the matching logic is performed less often than with the queue
use-case.

I can't say for sure this is your problem. You may be hitting some other
issue. Hopefully your answers to my questions will help narrow it down.

FWIW, ActiveMQ Artemis 2.26.0 contains some performance optimizations for a
few uncommonly complex selector use-cases [1].


Justin

[1] https://issues.apache.org/jira/browse/ARTEMIS-3962

On Wed, Oct 5, 2022 at 9:42 AM <s...@cecurity.com.invalid> wrote:

> Hello,
>
> Thank you for your response. The clients refused the downtime at the
> time for the store dump and we ended up with a live cleanup of the queue
> that they couldn't consume fast enough (holding ~20M response messages)
>
> The occurrence of that specific WARNING diminished significantly after
> that and haven't been seen again in over a week.
>
> Still have a suspected performance issue on that broker though, after a
> few hours uptime the consuming of that queue slows and messages pile up.
>
> My other queues on the same broker are at zero most of the time for an
> identical or similar volume over the same timeframe.
>
> At some point we had
>
> AMQ222144: Queue could not finish waiting executors. Try increasing the
> threadpool size
>
> which we did, by 50% ; the warning disappeared but not the slowing issue.
>
> The specificity of the the usage of that queue is that
>
> 1/ consumers use selectors to pick their message of interest
>
> 2/ One of the client application create and destroy a lot of short lived
> consumers. There may have more than 400 consumers connected at some point.
>
> Is there more specific documentation pointers available on dimensioning
> resources on the broker depending on some usage patterns ?
>
> Is there a known issue with performance of selectors on 2.24.0 ? In
> particular that may degrade over time.
>
> It's happening even when the queue has not reach the paging threshold
> and still fit all in mem and no significant iowait on server side have
> been spotted.
>
> Memory heap usage of the VM seems stable and well below allocation.
> netty threads not specified on acceptor, default seems to be at 96 and
> most of them in epoll wait
>
> Regards,
>
>
> Le 22/09/2022 à 10:47, Clebert Suconic a écrit :
> > I don’t think it’s critical.  But I wanted to check your data to be able
> to
> > help.
> > On Thu, Sep 22, 2022 at 4:26 AM Clebert Suconic <
> clebert.suco...@gmail.com>
> > wrote:
> >
> >> Also, inspect the file before you upload it.
> >>
> >> On Thu, Sep 22, 2022 at 4:25 AM Clebert Suconic <
> clebert.suco...@gmail.com>
> >> wrote:
> >>
> >>> Is there any chance you could add a print data of your journal to the
> >>> JIRA?
> >>>
> >>>
> >>> The Artemis data print has a —safe argument that would obfuscate any
> data
> >>> only leaving journal structure recorded.
> >>>
> >>>
> >>> ./artemjs data print —safe
> >>>
> >>>
> >>> The system has to be stopped or you could copy your journal on another
> >>> server for this operation.
> >>>
> >>> On Tue, Sep 20, 2022 at 6:24 AM <s...@cecurity.com.invalid> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> This is related to my ticket
> >>>> https://issues.apache.org/jira/browse/ARTEMIS-3992
> >>>>
> >>>> We still have occasional spurts of messages like
> >>>>
> >>>> 2022-09-20 10:32:43,913 WARN  [org.apache.activemq.artemis.journal]
> >>>> AMQ142007: Can not find record 268 566 334 during compact replay
> >>>> 2022-09-20 10:32:43,913 WARN  [org.apache.activemq.artemis.journal]
> >>>> AMQ142007: Can not find record 268 568 921 during compact replay
> >>>> 2022-09-20 10:32:43,913 WARN  [org.apache.activemq.artemis.journal]
> >>>> AMQ142007: Can not find record 268 567 786 during compact replay
> >>>> 2022-09-20 10:32:43,913 WARN  [org.apache.activemq.artemis.journal]
> >>>> AMQ142007: Can not find record 268 569 685 during compact replay
> >>>> 2022-09-20 10:32:43,913 WARN  [org.apache.activemq.artemis.journal]
> >>>> AMQ142007: Can not find record 268 569 347 during compact replay
> >>>> 2022-09-20 10:32:43,913 WARN  [org.apache.activemq.artemis.journal]
> >>>> AMQ142007: Can not find record 268 566 991 during compact replay
> >>>>
> >>>> and some consumers are dead slow (~4msg/s) . It may be their fault but
> >>>> they struggle to find a client side cause. The queue they are querying
> >>>> now holds 14M msg
> >>>>
> >>>> I'm trying to determine
> >>>>
> >>>> a/ the severity of the problem exposed by those warning. Can the store
> >>>> be corrupt ? (we don't really have doubts on the physical layer, this
> is
> >>>> local storage on the server, no errors reported by the os)
> >>>>
> >>>> b/ it there a possibility that theses issues play a role in the
> >>>> performance problem ?
> >>>>
> >>>> Still on 2.24.0 like exposed in the jira issue.
> >>>>
> >>>> Any ideas ?
> >>>>
> >>>> Regards,
> >>>>
> >>>> SL
> >>>>
> >>> --
> >>> Clebert Suconic
> >>>
> >> --
> >> Clebert Suconic
> >>

Re: Slow consume issue (was: Re: AMQ142007: Can not find record during compact replay)

Reply via email to