In our case, the timeouts were happening because internode authentication was 
turned on and by default the user column family in the system_auth keyspace is 
replicated only on 1 node. We also had to tune the permissions_validity_in_ms 
from the default of 2000 ms to a larger value. The issue was that all 
authentication requests would go to one node, since it was replicated only on 1 
node. We set replication factor to n (# of nodes) on the system_auth keyspace.

Hope this helps.

Parag

From: Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, November 24, 2014 at 2:52 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: What causes NoHostAvailableException, WriteTimeoutException, and 
UnavailableException?

On Mon, Nov 24, 2014 at 12:57 PM, Kevin Burton 
<bur...@spinn3r.com<mailto:bur...@spinn3r.com>> wrote:
I’m trying to track down some exceptions in our production cluster.  I bumped 
up our write load and now I’m getting a non-trivial number of these exceptions. 
 Somewhere on the order of 100 per hour.

All machines have a somewhat high CPU load because they’re doing other tasks.  
I’m worried that perhaps my background tasks are just overloading cassandra and 
one way to mitigate this is to nice them to least favorable priority (this is 
my first tasks).

Two out of three of them are timeouts or lack of availability. Seeing this 
across your cluster is usually associated with hitting a "pre-fail" condition 
in terms of GC, where the amount of data stored per node makes the steady state 
working set larger than available non-fragmented heap. If you're graphing GC 
time, I would expect to see a concomitant spike there.

=Rob

Reply via email to