Thanks, extremely helpful. The key bit was I wasn't flushing the old Keyspace before re-running the stress test, so I was stuck at RF = 1 from a previous run despite passing RF = 2 to the stress tool.
On Sun, Oct 28, 2012 at 2:49 AM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > Operation [158320] retried 10 times - error inserting key 0158320 > ((UnavailableException)) > > This means that at the point where the thrift request to write data > was handled, the co-ordinator node (the one your client is connected > to) believed that, among the replicas responsible for the key, too > many were down to satisfy the consistency level. Most likely causes > would be that you're in fact not using RF > 2 (e.g., is the RF really > > 1 for the keyspace you're inserting into), or you're in fact not > using ONE. > > > I'm sure my naive setup is flawed in some way, but what I was hoping for > was when the node went down it would fail to write to the downed node and > instead write to one of the other nodes in the clusters. So question is why > are writes failing even after a retry? It might be the stress client > doesn't pool connections (I took > > Write always go to all responsible replicas that are up, and when > enough return (according to consistency level), the insert succeeds. > > If replicas fail to respond you may get a TimeoutException. > > UnavailableException means it didn't even try because it didn't have > enough replicas to even try to write to. > > (Note though: Reads are a bit of a different story and if you want to > test behavior when nodes go down I suggest including that. See > CASSANDRA-2540 and CASSANDRA-3927.) > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) >