On 7/17/25 09:23, Ciaran wrote:
Hi all,
It's been a while, but I'm occasionally seeing 'fatal' connection failures
for my client when running inside the Azure Kubernetes environment.
I've configured the Transport and ConnectionOptions like:
ConnectionOptions options = new ConnectionOptions();
options.sslOptions().sslEnabled(this.options.isUseSSL());
options.transportOptions().useWebSockets(false);
options.transportOptions().webSocketPath("");
options.reconnectOptions()
.reconnectEnabled(true)
.useReconnectBackOff(true)
.reconnectDelay(5000)
.maxReconnectDelay(10240001)
.maxReconnectAttempts(12)
.warnAfterReconnectAttempts(1);
I am /not/ specifying idleTimeout on the options.
During testing when I was synthesizing connection failures I saw quite
clear logging that connection failures and retries were occurring. But in
the specific instance within the cluster we see no such logging, just the
exception I've included at the end of this email.
We've run into a whole bunch of issues when using Java within kubernetes, a
fair chunk of which I've been able to trace back to problems with k8s
silently dropping IDLE TCP connections, which we've resolved by configuring
the tcpKeepAlive settings in the relevant Java libraries, and I can't help
but think this might be a similar issue.
I've had a look at the available options and I see that the
transportoptions do support a tcpKeepAlive but I'm unclear on how we could
configure the periodicity, I /think/ I may need to enable epoll support,
but it's not clear to me (sorry) how to achieve that?
Before I go down this rabbit hole, would you expect the configuration I've
specified to behave in the manner of the exception below, no visible
logging of retries. Could this be related to me not specifying the
idleTimeout?
I should note that this is extremely rare, we can run continuously for
several weeks with no issue, and the load on the connection is very, very
low.
This is the exception we see.
Caused by:
org.apache.qpid.protonj2.client.exceptions.ClientLinkRemotelyClosedException:
Link remotely closed without explanation from the remote
at
org.apache.qpid.protonj2.client.impl.ClientExceptionSupport.convertToLinkClosedException(ClientExceptionSupport.java:217)
at
org.apache.qpid.protonj2.client.impl.ClientLinkType.handleRemoteCloseOrDetach(ClientLinkType.java:364)
at
org.apache.qpid.protonj2.engine.impl.ProtonEndpoint.fireRemoteClose(ProtonEndpoint.java:139)
at
org.apache.qpid.protonj2.engine.impl.ProtonLink.remoteDetach(ProtonLink.java:673)
at
org.apache.qpid.protonj2.engine.impl.ProtonSession.remoteDetach(ProtonSession.java:545)
at
org.apache.qpid.protonj2.engine.impl.ProtonConnection.handleDetach(ProtonConnection.java:547)
at
org.apache.qpid.protonj2.engine.impl.ProtonPerformativeHandler.handleDetach(ProtonPerformativeHandler.java:148)
at
org.apache.qpid.protonj2.engine.impl.ProtonPerformativeHandler.handleDetach(ProtonPerformativeHandler.java:43)
at
org.apache.qpid.protonj2.types.transport.Detach.invoke(Detach.java:132)
at
org.apache.qpid.protonj2.engine.IncomingAMQPEnvelope.invoke(IncomingAMQPEnvelope.java:69)
at
org.apache.qpid.protonj2.engine.impl.ProtonPerformativeHandler.handleRead(ProtonPerformativeHandler.java:68)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.invokeHandlerRead(ProtonEngineHandlerContext.java:187)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.fireRead(ProtonEngineHandlerContext.java:147)
at
org.apache.qpid.protonj2.engine.impl.ProtonFrameLoggingHandler.handleRead(ProtonFrameLoggingHandler.java:101)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.invokeHandlerRead(ProtonEngineHandlerContext.java:187)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.fireRead(ProtonEngineHandlerContext.java:147)
at
org.apache.qpid.protonj2.engine.impl.ProtonFrameDecodingHandler$FrameBodyParsingStage.parse(ProtonFrameDecodingHandler.java:387)
at
org.apache.qpid.protonj2.engine.impl.ProtonFrameDecodingHandler$FrameSizeParsingStage.parse(ProtonFrameDecodingHandler.java:265)
at
org.apache.qpid.protonj2.engine.impl.ProtonFrameDecodingHandler.handleRead(ProtonFrameDecodingHandler.java:99)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.invokeHandlerRead(ProtonEngineHandlerContext.java:199)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngineHandlerContext.fireRead(ProtonEngineHandlerContext.java:132)
at
org.apache.qpid.protonj2.engine.impl.ProtonEnginePipeline.fireRead(ProtonEnginePipeline.java:301)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngine.ingest(ProtonEngine.java:266)
at
org.apache.qpid.protonj2.engine.impl.ProtonEngine.ingest(ProtonEngine.java:54)
at
org.apache.qpid.protonj2.client.impl.ClientTransportListener.transportRead(ClientTransportListener.java:59)
at
org.apache.qpid.protonj2.client.transport.netty4.TcpTransport$NettyDefaultHandler.dispatchReadBuffer(TcpTransport.java:522)
at
org.apache.qpid.protonj2.client.transport.netty4.TcpTransport$NettyTcpTransportHandler.channelRead0(TcpTransport.java:533)
at
org.apache.qpid.protonj2.client.transport.netty4.TcpTransport$NettyTcpTransportHandler.channelRead0(TcpTransport.java:529)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
at
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
at
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
... 1 more
This error indicates that the remote simply closed a link via a detach
frame and not the actual connection itself so I wouldn't expect the
client to attempt any reconnect in this case. The remote close of a
link requires that the application handle that and either recreate the
link (sender or receiver) or completely close out the client and rebuild
state from the start. It is possible the Azure end is closing out a
link that has been idle to long which isn't something the client can
manage as it has no insight into the remote and its configuration or
requirements.
--
Tim Bish
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org