Cassandra 3.11.4 Node the load starts to increase after few minutes to 40 on 4 CPU machine

2019-10-16 Thread Sergio Bilello
Hello guys! I performed a thread dump https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMTAvMTcvLS1kdW1wLnR4dC0tMC0zMC00MA==&; while try to join the node with -Dcassandra.join_ring=false OR -Dcassandra.join.ring=false OR -Djoin.ring=false because the node spiked in load and laten

Re: Constant blocking read repair for such a tiny table

2019-10-16 Thread Patrick Lee
doesn't seem to be the same, it looks like just less than 10% of the read traffic. the query i originally posted was one that we captured and used as an example. every time i would run it at local_quorum, all, quorum... it would do a read repair. the record hasn't been updated for a long time fro

Re: Constant blocking read repair for such a tiny table

2019-10-16 Thread Jeff Jirsa
The only way you're going to figure this is to run with tracing and find a key that is definitely being repaired multiple times. Is it always the same instance? Is it random instances? You're suggesting blocking RR despite no mismatches, which basically implies something is digesting incorrectly.

Re: nodetool rebuild on non-empty nodes?

2019-10-16 Thread Voytek Jarnot
Apologies for the bump, but I'm wondering if anyone has any thoughts on the question below - specifically about running nodetool rebuild on a destination that has data that does not exist in the source Thanks. On Wed, Sep 11, 2019 at 2:41 PM Voytek Jarnot wrote: > Pardon the convoluted scenario

Re: Constant blocking read repair for such a tiny table

2019-10-16 Thread Patrick Lee
we do have otc_coalescing_strategy, we did run into that long while back were we see better performance with this off. and most recently, disk_access_mode to mmap_index_only as we have a few clusters where we would experience a lot more disk IO causing high load, high cpu and so latencies were craz

Configurations for better performance

2019-10-16 Thread Ramnatthan Alagappan
Hello, I am from a research group that's looking into automatic tuning of configuration parameters of data-intensive systems. In particular, we are interested in distributed data stores and so Cassandra is a great candidate. To this end, we would like to know a bit about what configuration setti

Re: cluster rolling restart

2019-10-16 Thread Jérémy SEVELLEC
Hi, I would say that i agree with Jon, Jeff and Alain at the same time ;-) Basically you should be very comfortable to do it for conf, cassandra version or Os update but not because if you not do it your cluster starts suffering from performance issues or something like that. If so, you should in

RE: Constant blocking read repair for such a tiny table

2019-10-16 Thread ZAIDI, ASAD
Wondering if you’ve disabled otc_coalescing_strategy CASSANDRA-12676 since you’ve upgraded from 2.x? also if you found luck by increasing native_transport_max_threads to address blocked NTRs (CASSANDRA-11363)? ~Asad From: Patrick Le

Re: Constant blocking read repair for such a tiny table

2019-10-16 Thread Patrick Lee
haven't really figured this out yet. it's not a big problem but it is annoying for sure! the cluster was upgraded from 2.1.16 to 3.11.4. now my only thing is i'm not sure if had this type of behavior before the upgrade. i'm leaning toward a no based on my data but i'm just not 100% sure. just 1

Re: Elevated response times from all nodes in a data center at the same time.

2019-10-16 Thread Reid Pinchback
Something else came to mind. You’re on AWS. You always have to keep noisy neighbor problems in the back of the mind when you aren’t running on bare metal. Basically, either your usage pattern during these incidents is unchanged… or it is not unchanged. If it is unchanged, and the problem happ

Re: Elevated response times from all nodes in a data center at the same time.

2019-10-16 Thread Jon Haddad
It's possible the queries you're normally running are served out of page cache, and during the latency spike you're hitting your disks. If you're using read ahead you might be hitting a throughput limit on the disks. I've got some numbers and graphs I can share later when I'm not on my phone. Jon

Re: cluster rolling restart

2019-10-16 Thread Jon Haddad
I agree with Jeff here. Ideally you should be so comfortable with rolling restarts that they become second nature. Cassandra is designed to handle them and you should not be afraid to do them regularly. On Wed, Oct 16, 2019, 8:06 AM Jeff Jirsa wrote: > > Personally I encourage you to rolling res

Re: cluster rolling restart

2019-10-16 Thread Jeff Jirsa
Personally I encourage you to rolling restart from time to time, use it as an opportunity to upgrade kernels and JDKs and cassandra itself and just generally make sure things are healthy and working how you expect If you see latencies jump or timeouts when you’re bouncing, that’s a warning an

Re: Elevated response times from all nodes in a data center at the same time.

2019-10-16 Thread Alain RODRIGUEZ
Hello Bill, I think it might be worth it to focus the effort on diagnosing the issue properly in the first place, thus I'll try to guide you through this. First some comments on your environment: AWS Regions: us-east-1 and us-west-2. Deployed over 3 availability zone in > each region. > No of No

Re: cluster rolling restart

2019-10-16 Thread Marco Gasparini
Great! Thank you very much Alain! Il giorno mer 16 ott 2019 alle ore 10:56 Alain RODRIGUEZ ha scritto: > Hello Marco, > > No this should not be a 'normal' / 'routine' thing in a Cassandra cluster. > I can imagine it being helpful in some cases or versions of Cassandra if > they are memory issues

Re: cluster rolling restart

2019-10-16 Thread Alain RODRIGUEZ
Hello Marco, No this should not be a 'normal' / 'routine' thing in a Cassandra cluster. I can imagine it being helpful in some cases or versions of Cassandra if they are memory issues/leaks or something like that, going wrong, but 'normally', you should not have to do that. Even more, when doing s

cluster rolling restart

2019-10-16 Thread Marco Gasparini
hi all, I was wondering if it is recommended to perform a rolling restart of the cluster once in a while. Is it a good practice or necessary? how often? Thanks Marco