Hi!

I have a design question about the communication layer in Cassandra. Some context first:

1.      I am using Cassandra in a high inter-node communication latency
   environment, e.g latency between nodes is higher than 60 ms, like in
   multi-datacenter environments.
2.      Currently, I am running my tests in a three node cluster, with
   replication factor equal to three.
3.      I am working in a low throughput scenario, where I am trying to
   reduce the latency as much as possible for quorum operations (both
   reads and writes)
4.      All my query-related messages between nodes can be considered
   small (all query-related messages would most likely go through the
   smallMessageChannel).

I am trying to understand some of the design decisions made on the org.apache.cassandra.net.async.OutboundMessagingConnection and org.apache.cassandra.net.async.OutboundMessagingPool.

If I disable completely coalescing (e.g. using otc_coalescing_strategy as disabled), and I send two almost simultaneous request that affects the same set of replicas, after the coordinator sends the message to one of the data nodes. In the data node, the fastest query would complete and use the OutboundMessagingConnection and block until a TCP ACK is received, then all the other pending messages in the data node would be sent in the next TCP PSH. I notice this by running tcpdump and Wireshark in the nodes and analyzing the TCP connections between nodes.

My question is: Why is Cassandra using TCP in a synchronous manner? and not using some type of windowing to allow more messages to go in the wire before waiting for an ack?. Is this because of some of the guarantees that Cassandra provides?

I know coalescing could help, but if the messages are separated by more than the window (200 us by default in the fix setting), then it would need to wait for a whole RTT (60 ms), before it is sent, so for many operations in my system, I see that the latency perceived is on average 2 RTT instead of the quorum round trip that should be the slowest of all the communication paths.

I don't know enough about Netty and TCP to figure out this by myself, but I think I understand the basics of OutboundMessagingConnection and OutboundMessagingConnectionPool, so I am not sure why that requirement for synchronous TCP PSH and ACK is needed.

Thanks a lot for your help! Cassandra code is really good, and I was able to understand many of the components thanks to it.

Best regards,
Enrique Saurez

Reply via email to