Marcin  ; 
are all your nodes within the same Region   ?   If not in the same region,   
what is the Snitch type that you are using   ? 
Jan/ 


     On Thursday, April 2, 2015 3:28 AM, Michal Michalski 
<michal.michal...@boxever.com> wrote:
   

 Hey Marcin,
Are they actually going up and down repeatedly (flapping) or just down and they 
never come back?There might be different reasons for flapping nodes, but to 
list what I have at the top of my head right now:
1. Network issues. I don't think it's your case, but you can read about the 
issues some people are having when deploying C* on AWS EC2 (keyword to look 
for: phi_convict_threshold)
2. Heavy load. Node is under heavy load because of massive number of reads / 
writes / bulkloads or e.g. unthrottled compaction etc., which may result in 
extensive GC.
Could any of these be a problem in your case? I'd start from investigating GC 
logs e.g. to see how long does the "stop the world" full GC take (GC logs 
should be on by default from what I can see [1])
[1] https://issues.apache.org/jira/browse/CASSANDRA-5319
Michał

Kind regards,Michał Michalski,michal.michal...@boxever.com
On 2 April 2015 at 11:05, Marcin Pietraszek <mpietras...@opera.com> wrote:

Hi!

We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
installed. Assume we have nodes A, B, C, D, E. On some irregular basis
one of those nodes starts to report that subset of other nodes is in
DN state although C* deamon on all nodes is running:

A$ nodetool status
UN B
DN C
DN D
UN E

B$ nodetool status
UN A
UN C
UN D
UN E

C$ nodetool status
DN A
UN B
UN D
UN E

After restart of A node, C and D report that A it's in UN and also A
claims that whole cluster is in UN state. Right now I don't have any
clear steps to reproduce that situation, do you guys have any idea
what could be causing such behaviour? How this could be prevented?

It seems like when A node is a coordinator and gets request for some
data being replicated on C and D it respond with Unavailable
exception, after restarting A that problem disapears.

--
mp




  

Reply via email to