Re: nodetool status shows large numbers of up nodes are down

2015-02-10 Thread Carlos Rolo
Can you run nodetool tpstats and check if there is pending requests on GossipStage. The timeout should not affect gossip (AFAIK). As for problems you can have with this state is, if your nodes are marked down for long and if you are using hinted handoff, your hints may not be delivered and your dat

Re: nodetool status shows large numbers of up nodes are down

2015-02-10 Thread Chris Lohfink
Are you hitting long GCs on your nodes? Can check gc log or look at cassandra log for GCInspector. Chris On Tue, Feb 10, 2015 at 1:28 PM, Cheng Ren wrote: > Hi Carlos, > Thanks for your suggestion. We did check the NTP setting and clock, and > they are all working normally. Schema versions are

Re: nodetool status shows large numbers of up nodes are down

2015-02-10 Thread Cheng Ren
Hi Carlos, Thanks for your suggestion. We did check the NTP setting and clock, and they are all working normally. Schema versions are also consistent with peers'. BTW, the only change we made was to set some of nodes' request timeout(read_request_timeout, write_request_timeout, range_request_timeou

Re: nodetool status shows large numbers of up nodes are down

2015-02-09 Thread Carlos Rolo
Hi Cheng, Are all machines configured with NTP and all clocks in sync? If that is not the case do it. If your clocks are not in sync it causes some weird issues like the ones you see, but also schema disagreements and in some cases corrupted data. Regards, Regards, Carlos Juzarte Rolo Cassandr

nodetool status shows large numbers of up nodes are down

2015-02-09 Thread Cheng Ren
Hi, We have a two-dc cluster with 21 nodes and 27 nodes in each DC. Over the past few months, we have seen nodetool status marks 4-8 nodes down while they are actually functioning. Particularly today we noticed that running nodetool status on some nodes shows higher number of nodes are down than be