Hello, No, The failed connections are from the external clients (I do not have the client environments, nor its code). On the embedded broker, the server-side use vm connectors which to not seems to have such issues (and do not use netty-ssl).
We made a deployment with a standalone artemis (2.16) to act as a sort of proxy broker for the embedded one. We have connections failures from clients on it too. The bridges used to forward locally seems fine (but its a different context, the clients use JMS on openwire) No i did not do a sampling with visualvm. It happens mostly on a production environnement and trying to produce reliably the exact problem on test have been a mixed bag. I did capture more stacktrace last night at a point where the issue was occuring more frequently and it seems the netty-threads were much less free than during previous observations Name: Thread-50 (activemq-netty-threads) State: BLOCKED on org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor@662692e8 owned by: Thread-95 (activemq-netty-threads) Total blocked: 145Â 739 Total waited: 4Â 186 Stack trace: org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler(NettyAcceptor.java:492) org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$4.initChannel(NettyAcceptor.java:403) io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129) io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112) io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:953) io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:610) io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46) io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1461) io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1126) io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:651) io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:515) io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:428) io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:487) io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333) io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) I found a bit odd than a lot of netty threads where stuck at this point, but i'm not familiar with netty internals Le 07/01/2021 à 03:12, Tim Bain a écrit : > For the embedded 2.10.1 broker case, are you saying that connections failed > when made from other threads in the process in which the broker was > embedded? If so, that would seem to rule out the network, since traffic > would never leave the host. > > You mentioned capturing a stack trace, but have you done CPU sampling via > VisualVM or a similar tool? CPU sampling isn't a perfectly accurate > technique, but often it gives enough information to narrow in on the cause > of a problem (or to rule out certain possibilities). > > Tim > > On Wed, Jan 6, 2021, 10:34 AM Sébastien LETHIELLEUX < > [email protected]> wrote: > >> Hello (again), >> >> I'm trying to find the root cause of a significant number of failed >> connexions attempts / broken existing connections on an artemis broker. >> >> The issue have been produced on an embedded artemis 2.10.1 and a >> standalone 2.16.0 (tomcat9, openjdk11) >> >> Two type of errors occurs : timeouts during handshakes and broken >> existing connexions. >> >> such as >> >> 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server] >> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol >> handshake with /xxx.xxx.xxx.xxx:41760 has occurred. >> >> 2021-01-06 16:56:28,016 WARN {Thread-16 >> >> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59 >> )} >> [org.apache.activemq.artemis.core.client] : AMQ212037: Connection >> failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not >> receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection >> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] >> >> Both brokers were deployed on RHEL7 with artemis-native and libaio (32 >> logical cores, plenty of RAM). Clients use JMS with openwire >> (activemq-client). >> >> The investigations on network infrastructures came up empty handed, so >> I'm trying to explore the possibility that something went wrong in >> artemis underpants. >> >> Is there a possibility that the thread pool configured with >> remotingThreads is too small (default values) ? The observation of the >> thread stack in JMX seems to expose plenty of threads happily idle. >> >> The clients are known to open and close a lot of connections (we know >> it's wrong, and now they know it too, but it still should work). The >> number of open connections is usually around 90-100 which hardly seems >> like an unbearable burden. >> >> Any ideas or suggestions on what to check/monitor/etc ? >> >> Regards, >> >> SL >> >>
