Hello,

No, The failed connections are from the external clients (I do not have
the client environments, nor its code). On the embedded broker, the
server-side use vm connectors which to not seems to have such issues
(and do not use netty-ssl).

We made a deployment with a standalone artemis (2.16) to act as a sort
of proxy broker for the embedded one. We have connections failures from
clients on it too. The bridges used to forward locally seems fine (but
its a different context, the clients use JMS on openwire)

No i did not do a sampling with visualvm. It happens mostly on a
production environnement and trying to produce reliably the exact
problem on test have been a mixed bag.

I did capture more stacktrace last night at a point where the issue was
occuring more frequently and it seems the netty-threads were much less
free than during previous observations

Name: Thread-50 (activemq-netty-threads)
State: BLOCKED on
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor@662692e8
owned by: Thread-95 (activemq-netty-threads)
Total blocked: 145 739  Total waited: 4 186

Stack trace:
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler(NettyAcceptor.java:492)
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$4.initChannel(NettyAcceptor.java:403)
io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:953)
io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:610)
io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1461)
io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1126)
io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:651)
io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:515)
io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:428)
io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:487)
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

I found a bit odd than a lot of netty threads where stuck at this point,
but i'm not familiar with netty internals


Le 07/01/2021 à 03:12, Tim Bain a écrit :
> For the embedded 2.10.1 broker case, are you saying that connections failed
> when made from other threads in the process in which the broker was
> embedded? If so, that would seem to rule out the network, since traffic
> would never leave the host.
>
> You mentioned capturing a stack trace, but have you done CPU sampling via
> VisualVM or a similar tool? CPU sampling isn't a perfectly accurate
> technique, but often it gives enough information to narrow in on the cause
> of a problem (or to rule out certain possibilities).
>
> Tim
>
> On Wed, Jan 6, 2021, 10:34 AM Sébastien LETHIELLEUX <
> [email protected]> wrote:
>
>> Hello (again),
>>
>> I'm trying to find the root cause of a significant number of failed
>> connexions attempts / broken existing connections on an artemis broker.
>>
>> The issue have been produced on an embedded artemis 2.10.1 and a
>> standalone 2.16.0 (tomcat9, openjdk11)
>>
>> Two type of errors occurs : timeouts during handshakes and broken
>> existing connexions.
>>
>> such as
>>
>> 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server]
>> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol
>> handshake with /xxx.xxx.xxx.xxx:41760 has occurred.
>>
>> 2021-01-06 16:56:28,016 WARN  {Thread-16
>>
>> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59
>> )}
>> [org.apache.activemq.artemis.core.client] : AMQ212037: Connection
>> failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not
>> receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection
>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>>
>> Both brokers were deployed on RHEL7 with artemis-native and libaio (32
>> logical cores, plenty of RAM). Clients use JMS with openwire
>> (activemq-client).
>>
>> The investigations on network infrastructures came up empty handed, so
>> I'm trying to explore the possibility that something went wrong in
>> artemis underpants.
>>
>> Is there a possibility that the thread pool configured with
>> remotingThreads is too small (default values) ? The observation of the
>> thread stack in JMX seems to expose plenty of threads happily idle.
>>
>> The clients are known to open and close a lot of connections (we know
>> it's wrong, and now they know it too, but it still should work). The
>> number of open connections is usually around 90-100 which hardly seems
>> like an unbearable burden.
>>
>> Any ideas or suggestions on what to check/monitor/etc ?
>>
>> Regards,
>>
>> SL
>>
>>

Reply via email to