Ec2MultiRegionSnitch difficulties (3.11.2)

2019-06-27 Thread Voytek Jarnot
Curious if anyone could shed some light on this. Trying to set up a 4-node, one DC (for now, same region, same AZ, same VPC, etc) cluster in AWS. All nodes have the following config (everything else basically standard): cassandra.yaml: listen_address: NODE?_PRIVATE_IP seeds:

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
> Is there an order in which the events you described happened, or is the order with which you presented them the order you notice things going wrong? At first, threads count (Thrift) start increasing. After 2 or 3 minutes they consume all CPU cores. After that, simultaneously: message drops

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Avinash Mandava
Yeah i skimmed too fast, don't add more work if CPU is pegged, and if using thrift protocol NTR would not have values. Is there an order in which the events you described happened, or is the order with which you presented them the order you notice things going wrong? On Thu, Jun 27, 2019 at 1:29

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
Thanks for your reply! > Have you tried increasing concurrent reads until you see more activity in disk? When problem occurs, freshly created 1.2k - 2k Thrift threads consume all CPU on all cores. Does increasing concurrent reads may help in this situation? >

Re: Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-06-27 Thread Avinash Mandava
Here's the metrics you want. Depends on what GC you're using as Dimo said above. *1) If you're using CMS - Collection time / Collection count (Avg time per collection)* *ParNew* (java.lang.type=GarbageCollector.name=ParNew.CollectionTime /

Write count vs Local Write Count

2019-06-27 Thread raja k
Hello, Can any one tell me the difference b/w Write Count vs Local write count from node tool tablestats output ? Below is what I see for one of my table  Write Count: 248214002  Write Latency: 0.07470789510093795 ms.   Local write count: 1183420   Local write latency: NaN ms Thanks,

Re: Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Avinash Mandava
Have you tried increasing concurrent reads until you see more activity in disk? If you've always got 32 active reads and high pending reads it could just be dropping the reads because the queues are saturated. Could be artificially bottlenecking at the C* process level. Also what does this metric

Re: Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-06-27 Thread Dimo Velev
That is s standard jvm metric. Connect to your cassandra node with a JMX browser (jconsole, jmc, ...) and browse the metrics. Depending on the garbage collector you use, they will be different but are there On Thu, 27 Jun 2019, 13:47 Ahmed Eljami, wrote: > Hi, > > I want to know if it's

Get information about GC pause (Stop the world) via JMX, it's possible ?

2019-06-27 Thread Ahmed Eljami
Hi, I want to know if it's possible to get information about GC pause duration (Stop the world) via JMX. Today, we get this information from gc.log with the JVM option XX:+PrintGCApplicationStoppedTime{color} Total time for which application threads were stopped: 0.0001273 seconds, Stopping

Bursts of Thrift threads make cluster unresponsive

2019-06-27 Thread Dmitry Simonov
Hello! We've met several times the following problem. Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes: - all CPUs have 100% load (normally we have LA 5 on 16-cores machine) - cassandra's threads count raises from 300 to 1300 - 2000,most of them are Thrift threads in