Re: Simulating a failed node

Andrew Bialecki Mon, 29 Oct 2012 13:17:34 -0700

Thanks, extremely helpful. The key bit was I wasn't flushing the old
Keyspace before re-running the stress test, so I was stuck at RF = 1 from a
previous run despite passing RF = 2 to the stress tool.


On Sun, Oct 28, 2012 at 2:49 AM, Peter Schuller <peter.schul...@infidyne.com
> wrote:

> > Operation [158320] retried 10 times - error inserting key 0158320
> ((UnavailableException))
>
> This means that at the point where the thrift request to write data
> was handled, the co-ordinator node (the one your client is connected
> to) believed that, among the replicas responsible for the key, too
> many were down to satisfy the consistency level. Most likely causes
> would be that you're in fact not using RF > 2 (e.g., is the RF really
> > 1 for the keyspace you're inserting into), or you're in fact not
> using ONE.
>
> > I'm sure my naive setup is flawed in some way, but what I was hoping for
> was when the node went down it would fail to write to the downed node and
> instead write to one of the other nodes in the clusters. So question is why
> are writes failing even after a retry? It might be the stress client
> doesn't pool connections (I took
>
> Write always go to all responsible replicas that are up, and when
> enough return (according to consistency level), the insert succeeds.
>
> If replicas fail to respond you may get a TimeoutException.
>
> UnavailableException means it didn't even try because it didn't have
> enough replicas to even try to write to.
>
> (Note though: Reads are a bit of a different story and if you want to
> test behavior when nodes go down I suggest including that. See
> CASSANDRA-2540 and CASSANDRA-3927.)
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: Simulating a failed node

Reply via email to