On 7/17/25 09:23, Ciaran wrote:
Hi all,

It's been a while, but I'm occasionally seeing 'fatal' connection failures
for my client when running inside the Azure Kubernetes environment.

I've configured the Transport and ConnectionOptions like:

         ConnectionOptions options = new ConnectionOptions();
         options.sslOptions().sslEnabled(this.options.isUseSSL());
         options.transportOptions().useWebSockets(false);
         options.transportOptions().webSocketPath("");
         options.reconnectOptions()
             .reconnectEnabled(true)
             .useReconnectBackOff(true)
             .reconnectDelay(5000)
             .maxReconnectDelay(10240001)
             .maxReconnectAttempts(12)
             .warnAfterReconnectAttempts(1);

I am /not/ specifying idleTimeout on the options.

During testing when I was synthesizing connection failures I saw quite
clear logging that connection failures and retries were occurring. But in
the specific instance within the cluster we see no such logging, just the
exception I've included at the end of this email.

We've run into a whole bunch of issues when using Java within kubernetes, a
fair chunk of which I've been able to trace back to problems with k8s
silently dropping IDLE TCP connections, which we've resolved by configuring
the tcpKeepAlive settings in the relevant Java libraries, and I can't help
but think this might be a similar issue.

I've had a look at the available options and I see that the
transportoptions do support a tcpKeepAlive but I'm unclear on how we could
configure the periodicity, I /think/ I may need to enable epoll support,
but it's not clear to me (sorry) how to achieve that?

Before I go down this rabbit hole, would you expect the configuration I've
specified to behave in the manner of the exception below, no visible
logging of retries. Could this be related to me not specifying the
idleTimeout?

I should note that this is extremely rare, we can run continuously for
several weeks with no issue, and the load on the connection is very, very
low.

This is the exception we see.

Caused by:
org.apache.qpid.protonj2.client.exceptions.ClientLinkRemotelyClosedException:
Link remotely closed without explanation from the remote
     at
org.apache.qpid.protonj2.client.impl.ClientExceptionSupport.convertToLinkClosedException(ClientExceptionSupport.java:217)
     at
org.apache.qpid.protonj2.client.impl.ClientLinkType.handleRemoteCloseOrDetach(ClientLinkType.java:364)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEndpoint.fireRemoteClose(ProtonEndpoint.java:139)
     at
org.apache.qpid.protonj2.engine.impl.ProtonLink.remoteDetach(ProtonLink.java:673)
     at
org.apache.qpid.protonj2.engine.impl.ProtonSession.remoteDetach(ProtonSession.java:545)
     at
org.apache.qpid.protonj2.engine.impl.ProtonConnection.handleDetach(ProtonConnection.java:547)
     at
org.apache.qpid.protonj2.engine.impl.ProtonPerformativeHandler.handleDetach(ProtonPerformativeHandler.java:148)
     at
org.apache.qpid.protonj2.engine.impl.ProtonPerformativeHandler.handleDetach(ProtonPerformativeHandler.java:43)
     at
org.apache.qpid.protonj2.types.transport.Detach.invoke(Detach.java:132)
     at
org.apache.qpid.protonj2.engine.IncomingAMQPEnvelope.invoke(IncomingAMQPEnvelope.java:69)
     at
org.apache.qpid.protonj2.engine.impl.ProtonPerformativeHandler.handleRead(ProtonPerformativeHandler.java:68)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.invokeHandlerRead(ProtonEngineHandlerContext.java:187)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.fireRead(ProtonEngineHandlerContext.java:147)
     at
org.apache.qpid.protonj2.engine.impl.ProtonFrameLoggingHandler.handleRead(ProtonFrameLoggingHandler.java:101)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.invokeHandlerRead(ProtonEngineHandlerContext.java:187)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.fireRead(ProtonEngineHandlerContext.java:147)
     at
org.apache.qpid.protonj2.engine.impl.ProtonFrameDecodingHandler$FrameBodyParsingStage.parse(ProtonFrameDecodingHandler.java:387)
     at
org.apache.qpid.protonj2.engine.impl.ProtonFrameDecodingHandler$FrameSizeParsingStage.parse(ProtonFrameDecodingHandler.java:265)
     at
org.apache.qpid.protonj2.engine.impl.ProtonFrameDecodingHandler.handleRead(ProtonFrameDecodingHandler.java:99)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.invokeHandlerRead(ProtonEngineHandlerContext.java:199)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.fireRead(ProtonEngineHandlerContext.java:132)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEnginePipeline.fireRead(ProtonEnginePipeline.java:301)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngine.ingest(ProtonEngine.java:266)
     at
org.apache.qpid.protonj2.engine.impl.ProtonEngine.ingest(ProtonEngine.java:54)
     at
org.apache.qpid.protonj2.client.impl.ClientTransportListener.transportRead(ClientTransportListener.java:59)
     at
org.apache.qpid.protonj2.client.transport.netty4.TcpTransport$NettyDefaultHandler.dispatchReadBuffer(TcpTransport.java:522)
     at
org.apache.qpid.protonj2.client.transport.netty4.TcpTransport$NettyTcpTransportHandler.channelRead0(TcpTransport.java:533)
     at
org.apache.qpid.protonj2.client.transport.netty4.TcpTransport$NettyTcpTransportHandler.channelRead0(TcpTransport.java:529)
     at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
     at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
     at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
     at
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
     at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
     at
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
     at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
     at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
     at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
     at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
     at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
     at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
     at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
     at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
     at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
     at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
     at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
     ... 1 more

This error indicates that the remote simply closed a link via a detach frame and not the actual connection itself so I wouldn't expect the client to attempt any reconnect in this case.  The remote close of a link requires that the application handle that and either recreate the link (sender or receiver) or completely close out the client and rebuild state from the start.  It is possible the Azure end is closing out a link that has been idle to long which isn't something the client can manage as it has no insight into the remote and its configuration or requirements.

--
Tim Bish


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Reply via email to