So I tried to run a repair with the following on one of the server. nodetool repair system_auth -pr –local
After two hours it hadn’t finished. I had to kill the repair because of another issue and haven’t tried again. Why would such a small table take so long to repair? Also what would happen if I set the RF back to a lower number like 5? Thanks From: <li...@beobal.com> on behalf of Sam Tunnicliffe <s...@beobal.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, August 30, 2017 at 10:10 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: system_auth replication factor in Cassandra 2.1 It's a better rule of thumb to use an RF of 3 to 5 per DC and this is what the docs now suggest: http://cassandra.apache.org/doc/latest/operating/security.html#authentication Out of the box, the system_auth keyspace is setup with SimpleStrategy and RF=1 so that it works on any new system including dev & test clusters, but obviously that's no use for a production system. Regarding the increased rate of authentication errors: did you run repair after changing the RF? Auth queries are done at CL.LOCAL_ONE, so if you haven't repaired, the data for the user logging in will probably not be where it should be. The exception to this is the default "cassandra" user, queries for that user are done at CL.QUORUM, which will indeed lead to timeouts and authentication errors with a very high RF. It's recommended to only use that default user to bootstrap the setup of your own users & superusers, the link above also has info on this. Thanks, Sam On 30 August 2017 at 16:50, Chuck Reynolds <creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote: So I’ve read that if your using authentication in Cassandra 2.1 that your replication factor should match the number of nodes in your datacenter. Is that true? I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS datacenter. Why do I want to replicate the system_auth table that many times? What are the benefits and disadvantages of matching the number of nodes as opposed to the standard replication factor of 3? The reason I’m asking the question is because it seems like I’m getting a lot of authentication errors now and they seem to happen more under load. Also, querying the system_auth table from cqlsh to get the users seems to now timeout. Any help would be greatly appreciated. Thanks