Re:Re: Re: Re: unconfigured table logtabl

2020-04-04 Thread David Ni
Thank you very much for your friendly note. ERROR [AntiEntropyStage:1] 2020-04-04 13:57:09,614 RepairMessageVerbHandler.java:177 - Table with id 21a3fa90-74c7-11ea-978a-b556b0c3a5ea was dropped during prepare phase of repair cassandra@cqlsh:system_schema> select keyspace_name,table_name,id f

one node down and cluster works better

2020-04-04 Thread Osman Yozgatlıoğlu
Hello, I manage one cluster with 2 dc, 7 nodes each and replication factor is 2:2 My insertion performance dropped somehow. I restarted nodes one by one and found one node degrades performance. Verified this node after problem occurs a couple of times. How can I continue to investigate? Regards,

Re: one node down and cluster works better

2020-04-04 Thread mehmet bursali
Hi Osman,Do you use any monitoring solution such as prometheus on your cluster?  If yes, you should install and use cassandra exporter from the link below and examine some detailed metrics.https://github.com/criteo/cassandra_exporter   ndroid’de Yahoo Postadan gönderildi 15:53’’4e’ 4 Nis 2020

OOM only on one datacenter nodes

2020-04-04 Thread Surbhi Gupta
Hi, We have two datacenter with 5 nodes each and have replication factor of 3. We have traffic on DC1 and DC2 is just for disaster recovery and there is no direct traffic. We are using 24cpu with 128GB RAM machines . For DC1 where we have live traffic , we don't see any issue, however for DC2 wher

Re: one node down and cluster works better

2020-04-04 Thread Erick Ramirez
With only 2 replicas per DC, it means you're likely writing with a consistency level of either ONE or LOCAL_ONE. Everytime you hit the problematic node, the write performance drops. All other configurations being equal, this indicates an issue with the commitlog disk on the node. Get your sysadmin

Re: OOM only on one datacenter nodes

2020-04-04 Thread Erick Ramirez
With a lack of heapdump for you to analyse, my hypothesis is that your DC2 nodes are taking on traffic (from some client somewhere) but you're just not aware of it. The hints replay is just a side-effect of the nodes getting overloaded. To rule out my hypothesis in the first instance, my recommend

Re: Re: Re: Re: unconfigured table logtabl

2020-04-04 Thread Erick Ramirez
This is confirmation that you have a schema disagreement in your cluster: - 21a3fa90-74c7-11ea-978a-b556b0c3a5ea = Friday, April 3 05:07:44 PT - 830028a0-7584-11ea-a277-bdf3d1289bdd = Friday, April 3 01:24:18 PT The schema on the node where you ran that query has an older version of the tab

Re: OOM only on one datacenter nodes

2020-04-04 Thread Reid Pinchback
Surbi: If you aren’t seeing connection activity in DC2, I’d check to see if the operations hitting DC1 are quorum ops instead of local quorum. That still wouldn’t explain DC2 nodes going down, but would at least explain them doing more work than might be on your radar right now. The hint repl