Kan Maung created CASSANDRA-19696: ------------------------------------- Summary: Observed large number of Inbound / Outbound connection disconnect / reconnects in log Key: CASSANDRA-19696 URL: https://issues.apache.org/jira/browse/CASSANDRA-19696 Project: Cassandra Issue Type: Bug Reporter: Kan Maung
We are seeing hundreds of InboundConnection established / closed messages on several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool tpstats' it seems gossip is close to the time out value. We are seeing hundreds of InboundConnection established / closed messages on several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool tpstats' it seems gossip is close to the time out value. It affects both the LargeMessage and UrgentMessage connections. In the example below this happens just 20 seconds after it connected. These two nodes are in the same datacenter, so there should be no geographical latency between them. This cluster 111 has a very standard cassandra.yaml for our environment. Gossiper uses MessagingService to send messages from the source to destination using OutboundConnection. Depending on the message type especially for LARGE_MESSAGES it is enqueued in a separate thread pool while URGENT_MESSAGES are delivered with Verb.Priority.P0. 127.10.20.88 cassandra.log: 2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 ip_address=127.10.20.88 InboundConnectionInitiator.java:529 - /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471 messaging connection established, version = 12, framing = CRC, encryption = encrypted(...) 2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 ip_address=127.10.20.88 OutboundConnection.java:1059 - /127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel closed by provider 127.10.30.171 log: 2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 ip_address=127.10.30.171 OutboundConnection.java:1059 - /127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel closed by provider io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out 2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 ip_address=127.10.30.171 OutboundConnection.java:1059 - /127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel closed by provider io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out 2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 ip_address=127.10.30.171 OutboundConnection.java:1153 - /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869 successfully connected, version = 12, framing = CRC, encryption = encrypted(...) 2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 ip_address=127.10.30.171 OutboundConnection.java:1153 - /127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284 successfully connected, version = 12, framing = LZ4, encryption = encrypted(...) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org