Re: TOO_MANY_KEY_UPDATES error with TLS

2023-04-12 Thread Elliott Sims via user
Update to this: per https://github.com/openssl/openssl/issues/8068 it looks like BoringSSL should avoid this issue, so it may be related to client behavior of some sort. It's unclear to me from the message whether it's intra-cluster traffic or client/cluster traffic generating the error. On

TOO_MANY_KEY_UPDATES error with TLS

2023-04-12 Thread Elliott Sims via user
A few weeks ago, we rolled out TLS among hosts in our clusters (running 4.0.7). More recently we also rolled out TLS between Cassandra clients and the cluster. Today, we started seeing a lot of dropped actions in one cluster that correlate with warnings like this: WARN

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Ralph Boehme
On 4/12/23 15:30, Jeff Jirsa wrote: Are you always inserting into the same partition (with contention) or different ? I'm actually updating the very same row. :) Which version are you using ? # nodetool version ReleaseVersion: 4.1.1 The short tldr is that the failure modes of the

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Jeff Jirsa
Are you always inserting into the same partition (with contention) or different ? Which version are you using ? The short tldr is that the failure modes of the existing paxos implementation (under contention, under latency, under cluster strain) can cause undefined states. I believe that a

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Ralph Boehme
On 4/11/23 21:14, Ralph Boehme wrote: On 4/11/23 19:53, Bowen Song via user wrote: That error message sounds like one of the nodes timed out in the paxos propose stage.  You can check the system.log and gc.log and see if you can find anything unusual in them, such as network errors, out of