Hello Sebastien, Yes, your approach is really interesting. I will test this in my system as well. I think it reduces some risks involved in the procedure that was discussed in the previous emails. Just for the record, availability is a top priority for my use cases that is why I have switched the default consistency level for authentication/authorization to LOCAL_ONE as it used in previous C* versions.
BR MK -----Original Message----- From: Sebastian Marsching <sebast...@marsching.com> Sent: April 22, 2024 21:58 To: Michalis Kotsiouros (EXT) via user <user@cassandra.apache.org> Subject: Re: Datacenter decommissioning on Cassandra 4.1.4 Recently, I successfully used the following procedure when decommissioning a datacenter: 1. Reduced the replication factor for this DC to zero for all keyspaces except the system_auth keyspace. For that keyspace, I reduced the RF to one. 2. Decommissioned all nodes except one in the DC using the regular procedure (no --force needed). 3. Decommissioned the last node using --force. 4. Set the RF for the system_auth keyspace to 0. This procedure has two benefits: 1. Authentication on the nodes in the DC being decommissioned will work until the last node has been decommissioned. This is important when authentication is enabled for JMX. Otherwise, you cannot proceed when there are too few nodes left to get a LOCAL_QUORUM on system_auth. 2. One does not have to use --force except when removing the last node. It would be nice if the RF for the system_auth keyspace could be reduced to zero before decommissioning the nodes. However, I think that implementing this correctly may be hard. If there are no local replicas, queries with a consistency level of LOCAL_QUORUM will probably fail, and this is the consistency level used for all authentication and authorization related queries. So, setting the RF to zero might break authentication and authorization, which in turn might make it impossible to decommission the nodes (without disabling authentication for that DC). So, I guess that the code dealing with authentication and authorization would have to be changed to use a CL of QUORUM instead of LOCAL_QUORUM when system_auth is not replicated in the local DC.
smime.p7s
Description: S/MIME cryptographic signature