Thank you for your response. - No selectors on other queues
- I checked the thread list on artemis console. At the time 2 or 3 threads where in BLOCKED state for example with this stack : 1. org.apache.activemq.artemis.core.server.impl.QueueImpl . addConsumer (QueueImpl.java:1414) 2. org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl . <init> (ServerConsumerImpl.java:258) 3. org.apache.activemq.artemis.core.server.impl.ServerSessionImpl . createConsumer (ServerSessionImpl.java:572) 4. org.apache.activemq.artemis.core.protocol.openwire.amq.AMQConsumer . init (AMQConsumer.java:156) 5. org.apache.activemq.artemis.core.protocol.openwire.amq.AMQSession . createConsumer (AMQSession.java:190) 6. org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection . addConsumer (OpenWireConnection.java:952) 7. org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection$CommandProcessor . processAddConsumer (OpenWireConnection.java:1258) 8. org.apache.activemq.command.ConsumerInfo . visit (ConsumerInfo.java:352) The others were all runnable, waiting or time waiting - The selectors are not complex imho, they check for a specific header for a prefix (<header> like '<prefix>%') and there is no messages matching none of the selectors. - The distribution seems balanced there is no significant gap for one type of consumer. - There is a few CPU spikes but the server is almost always at least 85%-90% idle. The broker seems to stay at 100-150% CPU but there is 32 (logical) cores. Couple of follow up findings : - after a while the log statements came back despite the thread pool increase AMQ222172 Queue <redacted> was busy for more than 10 000 milliseconds. There are possibly consumers hanging on a network operation [org.apache.activemq.artemis.core.server] AMQ222144: Queue could not finish waiting executors. Try increasing the thread pool size this was several hours after its started to slow down and at a time when the consume was totally floored (0 or a really small amount of messages per minute) - At that point the client applications spawning lots of consumers were stopped. The other one which use a more common pattern with a static pool of 10 consumers came back to life and happily consumed its own messages at its better known rate. When the other client component was restarted, it floored again and the thread pool warning came back from time to time It is possible that the open∕close of lots of consumers "wear down" the broker after some time ? Side note : the most recent client application with the static pool of consumers use the core protocol, the legacy one spawning lots of consumers use openwire with the classical activemq client lib We did not identified this pattern when it was on 2.16 (which we moved away from since the rollback on timeout on some XA transaction leaded to possibly corrupted store and general instability, but on a longer timeframe). Regards, Le 05/10/2022 à 18:06, Justin Bertram a écrit : >> Is there a known issue with performance of selectors on 2.24.0 ? In > particular that may degrade over time. > > I'm not aware of any particular issue, per se, but generally speaking, > using selectors on queue consumers is not good for performance. This is due > to the queue scanning required to match messages to the selector. > Performance won't degrade over time necessarily, but it will degrade as the > complexity of the selector increases or as the number of messages increases > or as the number of consumers increases (or any combination thereof). > > Couple of questions: > - Are selectors in use with the same volume of messages and consumers on > the queues where you're not seeing any problems? > - Have you gathered any thread dumps to see what the broker is doing > during these slow downs? If so, what did you find? If not, could you gather > a handful of thread dumps so we can know what's happening over time? > - Have you gathered any metrics related to CPU utilization during these > slow downs? If so, what did you find? > - Are there any "gaps" in the consumers' selectors such that some messages > may not actually match any selector and therefore will not be consumed? > - How complex are your selectors? > > Generally speaking, performance will be better with *topic* subscribers > using selectors. This is because the broker pre-sorts matching messages > into the corresponding subscription as they arrive at the topic which means > that the matching logic is performed less often than with the queue > use-case. > > I can't say for sure this is your problem. You may be hitting some other > issue. Hopefully your answers to my questions will help narrow it down. > > FWIW, ActiveMQ Artemis 2.26.0 contains some performance optimizations for a > few uncommonly complex selector use-cases [1]. > > > Justin > > [1] https://issues.apache.org/jira/browse/ARTEMIS-3962 > > On Wed, Oct 5, 2022 at 9:42 AM <s...@cecurity.com.invalid> wrote: > >> Hello, >> >> Thank you for your response. The clients refused the downtime at the >> time for the store dump and we ended up with a live cleanup of the queue >> that they couldn't consume fast enough (holding ~20M response messages) >> >> The occurrence of that specific WARNING diminished significantly after >> that and haven't been seen again in over a week. >> >> Still have a suspected performance issue on that broker though, after a >> few hours uptime the consuming of that queue slows and messages pile up. >> >> My other queues on the same broker are at zero most of the time for an >> identical or similar volume over the same timeframe. >> >> At some point we had >> >> AMQ222144: Queue could not finish waiting executors. Try increasing the >> threadpool size >> >> which we did, by 50% ; the warning disappeared but not the slowing issue. >> >> The specificity of the the usage of that queue is that >> >> 1/ consumers use selectors to pick their message of interest >> >> 2/ One of the client application create and destroy a lot of short lived >> consumers. There may have more than 400 consumers connected at some point. >> >> Is there more specific documentation pointers available on dimensioning >> resources on the broker depending on some usage patterns ? >> >> Is there a known issue with performance of selectors on 2.24.0 ? In >> particular that may degrade over time. >> >> It's happening even when the queue has not reach the paging threshold >> and still fit all in mem and no significant iowait on server side have >> been spotted. >> >> Memory heap usage of the VM seems stable and well below allocation. >> netty threads not specified on acceptor, default seems to be at 96 and >> most of them in epoll wait >> >> Regards, >> >> >> Le 22/09/2022 à 10:47, Clebert Suconic a écrit : >>> I don’t think it’s critical. But I wanted to check your data to be able >> to >>> help. >>> On Thu, Sep 22, 2022 at 4:26 AM Clebert Suconic < >> clebert.suco...@gmail.com> >>> wrote: >>> >>>> Also, inspect the file before you upload it. >>>> >>>> On Thu, Sep 22, 2022 at 4:25 AM Clebert Suconic < >> clebert.suco...@gmail.com> >>>> wrote: >>>> >>>>> Is there any chance you could add a print data of your journal to the >>>>> JIRA? >>>>> >>>>> >>>>> The Artemis data print has a —safe argument that would obfuscate any >> data >>>>> only leaving journal structure recorded. >>>>> >>>>> >>>>> ./artemjs data print —safe >>>>> >>>>> >>>>> The system has to be stopped or you could copy your journal on another >>>>> server for this operation. >>>>> >>>>> On Tue, Sep 20, 2022 at 6:24 AM <s...@cecurity.com.invalid> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> This is related to my ticket >>>>>> https://issues.apache.org/jira/browse/ARTEMIS-3992 >>>>>> >>>>>> We still have occasional spurts of messages like >>>>>> >>>>>> 2022-09-20 10:32:43,913 WARN [org.apache.activemq.artemis.journal] >>>>>> AMQ142007: Can not find record 268 566 334 during compact replay >>>>>> 2022-09-20 10:32:43,913 WARN [org.apache.activemq.artemis.journal] >>>>>> AMQ142007: Can not find record 268 568 921 during compact replay >>>>>> 2022-09-20 10:32:43,913 WARN [org.apache.activemq.artemis.journal] >>>>>> AMQ142007: Can not find record 268 567 786 during compact replay >>>>>> 2022-09-20 10:32:43,913 WARN [org.apache.activemq.artemis.journal] >>>>>> AMQ142007: Can not find record 268 569 685 during compact replay >>>>>> 2022-09-20 10:32:43,913 WARN [org.apache.activemq.artemis.journal] >>>>>> AMQ142007: Can not find record 268 569 347 during compact replay >>>>>> 2022-09-20 10:32:43,913 WARN [org.apache.activemq.artemis.journal] >>>>>> AMQ142007: Can not find record 268 566 991 during compact replay >>>>>> >>>>>> and some consumers are dead slow (~4msg/s) . It may be their fault but >>>>>> they struggle to find a client side cause. The queue they are querying >>>>>> now holds 14M msg >>>>>> >>>>>> I'm trying to determine >>>>>> >>>>>> a/ the severity of the problem exposed by those warning. Can the store >>>>>> be corrupt ? (we don't really have doubts on the physical layer, this >> is >>>>>> local storage on the server, no errors reported by the os) >>>>>> >>>>>> b/ it there a possibility that theses issues play a role in the >>>>>> performance problem ? >>>>>> >>>>>> Still on 2.24.0 like exposed in the jira issue. >>>>>> >>>>>> Any ideas ? >>>>>> >>>>>> Regards, >>>>>> >>>>>> SL >>>>>> >>>>> -- >>>>> Clebert Suconic >>>>> >>>> -- >>>> Clebert Suconic >>>>