(sorry for the delay in following up on this thread) > Actually, there's a question - is it 'acceptable' do you think > for GC to take out a small number of your nodes at a time, > so long as the bulk (or at least where RF is > nodes gone > on STW GC) of the nodes are okay? I suspect this is a > question peculiar to Amazon EC2, as I've never seen a box > rendered non-communicative by a single core flat-lining.
Well, first of all I still find it very strange that GC takes nodes down at all, unless one is specifically putting sufficiant CPU load on the cluster that e.g. concurrent GC causes a problem. But in particular if you're still seeing those crazy long GC pause times still, IMO something is severaly wrong and I would not personally recommend going to production with that unresolved since whatever the cause is, may suddenly start having other effects. Severely long ParNew pause times are really not expected; the only two major reasons I can think of, at least when running on real hardware, and barring JVM bugs, are (1) swapping, and (2) possibly extreme performance penalties associated with a very full old generation in which case the solution is "larger heap". I don't remember whether you indicated any heap statistics so I'm not sure whether (2) is a possibility. But I would expect OutOfMemory errors long before a ParNew takes 300+ *seconds*, just out of JVM policies w.r.t. acceptable GC efficiency. Bottom line: 300+ seconds for a ParNew collections is *way way way* out there. 300 *milli*-seconds is more along the lines of what one might expect (usually lower than that). Even if you can seemingly lessen the impact by using the throughput collector, I wouldn't be comfortable with shrugging off whatever is happening. That said, in terms of the effects on the cluster: I have not had much hands-on experience with this, but I believe you'd expect a definite visible from the point of view of clients. Cassandra is not optimized for instantly detecting slow nodes and transparently working around them with zero impact on clients; I don't think it is recommended to be running a cluster with nodes regularly bouncing in and our, for whatever reason, if it can be avoided. Not sure what to say, other than to strongly recommend getting to the bottom of this problem which seems non-specific to Cassandra, before relying on the system in a production setting. The extremity of the issues you're seeing are far beyond what I would ever expect even allowing for "who knows what EC2 is doing or what other people are running on the machine", except for the hypothesis that they over-commit memory and the extreme latencies are due to swapping. But if that is what is happening, that just tells me that EC2 is unusable for this type of thing, but I still think it's far fetched since the impact should be significant on a great number of customers of theirs. I forget and I didn't find it by brief sifting through thread history; were you running on small EC2 instances or larger ones? -- / Peter Schuller