Your schema may have read repair (non-blocking, background) set to 10%
(0.1, for dclocal).
You may have GC pauses causing writes (or reads) to be delayed.
You may be hitting a cassandra bug.
Would need the `TRACING` output to know for sure.
On Mon, Aug 10, 2020 at 10:10 PM Tobias Eriksson <
tobi
Hi
We have a Cassandra solution with 2 DCs where each DC has >30 nodes
From time to time we see problems with READ REPAIR, but I am stuck with the
analysis
We have a pattern for these faults where we do
1. INSERT with Local Quorum (2 out of 3)
2. Wait for 0.5 - 1 seconds time window
3.
Pushpendra,
I would recommend using an Ingress service like Envoy or Kong to manage
your communication. You get other things besides network connection
management such as security through mTLS. I wrote a short blog about this
which will be going up on New Stack hopefully soon.
Chris Bradford also
Use the service as contact points, not the IPs since they're ephemeral.
Even when all pods get replaced, they will still be accessible via the
service. Cheers!
>