Hi, We have a cassandra cluster built on Apache Cassandra 3.9 with 6 nodes and RF = 3. As part of re-building the cluster, we are testing the backup and restore strategy.
We took the snapshot and uploaded the files to S3 and data has been saved the data with folder names (backup_folder1 - 6 for nodes 1 - 6). Created a new cluster with the same number of nodes, and copied the data from S3 and created the schema. *Strategy 1: (using nodetool refresh)* 1) Copied back the data from S3 into one machine each based on the folders created (backup_folder1 - 6 to 6 nodes) 2) and performed nodetool refresh on the cluster. Ran the count: Count on previous cluster: 12125800 Count on new cluster: 10504780 *Strategy 2: using sstableloader* 1) Copied back the data from S3 into one machine each based on the folders created (backup_folder1 - 6 to 6 nodes) 2) and performed sstableloader on each node. Ran the count: Count on previous cluster: 12125800 Count on new cluster: 11705084 Looking at the results, i have bit disappointed that neither of the approach resulted 100% restore for me. If there is an error in taking the backup, it should have not given different counts. Any ideas on successful back-up and restore strategies.? and what could ve gone wrong in my process.? Thank You, Regards, Srini