Andrea Leopardi created CASSANDRA-19753:
-------------------------------------------

             Summary: Not getting responses with concurrent stream IDs in 
native protocol v5
                 Key: CASSANDRA-19753
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19753
             Project: Cassandra
          Issue Type: Bug
            Reporter: Andrea Leopardi
         Attachments: xandra.log

This is not gonna be an easy bug to report or to give a great set of repro 
steps for, so apologies in advance. I’m one of the authors and the maintainer 
of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for 
Elixir.

We noticed an issue with request timeouts in a new version of our client. Just 
for reference, the issue is [this 
one|https://github.com/whatyouhide/xandra/issues/356].

After some debugging, we figured out that the issue was limited to *native 
protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. 
With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm 
running C* in a Docker container, but I've had folks reproduce this with all 
sorts of C* setups.

h2. The Issue

The new version of our client in question uses concurrent requests. We assign 
each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a 
compliant way with [section 2.4.1.3. of the native protocol v5 
spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to
 the best of my knowledge.

Now, it seems like C* does not respond do all requests this way. We have a 
[simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that 
reproduces this. It just issues two requests in parallel (with stream IDs {{1}} 
and {{2}}) and then keeps issuing requests as soon as there are responses. 
Almost 100% of the times, we don't get the response on at least one stream. 
I've also attached some debug logs that show this in case it can be helpful 
(from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} syntax is 
Erlang's syntax for bytestrings, where each number is the decimal value for a 
single byte. You can see in the logs that we never get the response frame on 
stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.

I’m pretty short on what to do next on our end. I’ve tried shuffling around the 
socket buffer size as well (from {{10}} bytes to {{1000000}} bytes) to get the 
packets to split up in all sorts of places, but everything works as expected 
_except_ for the requests that are not coming out of C*.

Any other help is appreciated here, but I've started to suspect this might be 
something with C*. It could totally not be, but I figured it was worth to post 
out here.

Thank you all in advance folks! 💟



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to