Kan Maung created CASSANDRA-19696:
-------------------------------------

             Summary: Observed large number of Inbound / Outbound connection 
disconnect / reconnects in log
                 Key: CASSANDRA-19696
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19696
             Project: Cassandra
          Issue Type: Bug
            Reporter: Kan Maung


We are seeing hundreds of InboundConnection established / closed messages on 
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool 
tpstats' it seems gossip is close to the time out value.

We are seeing hundreds of InboundConnection established / closed messages on 
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool 
tpstats' it seems gossip is close to the time out value.
It affects both the LargeMessage and UrgentMessage connections.

 


In the example below this happens just 20 seconds after it connected. These two 
nodes are in the same datacenter, so there should be no geographical latency 
between them. This cluster 111 has a very standard cassandra.yaml for our 
environment.

Gossiper uses MessagingService to send messages from the source to destination 
using OutboundConnection.

Depending on the message type especially for LARGE_MESSAGES it is enqueued in a 
separate thread pool while URGENT_MESSAGES are delivered with Verb.Priority.P0.


127.10.20.88 cassandra.log:

2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
ip_address=127.10.20.88 InboundConnectionInitiator.java:529 - 
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
 messaging connection established, version = 12, framing = CRC, encryption = 
encrypted(...)

2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.20.88 OutboundConnection.java:1059 - 
/127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel closed 
by provider

 


127.10.30.171 log:

2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1059 - 
/127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel 
closed by provider

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection timed out

2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1059 - 
/127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel closed 
by provider

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection timed out

2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1153 - 
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
 successfully connected, version = 12, framing = CRC, encryption = 
encrypted(...)

2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1153 - 
/127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
 successfully connected, version = 12, framing = LZ4, encryption = 
encrypted(...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to