If you’re only going to have a small storage footprint per node like 100gb, another option comes to mind. Use an instance type with large ram. Use an EBS storage volume on an EBS-optimized instance type, and take EBS snapshots. Most of your database will be in chunk cache anyways, so you only need to make sure that the dirty background writer is keeping up. I’d take a look at iowait during a snapshot and see if the results are acceptable for a running node. Even if it is marginal, if you’re only snapshotting one node at a time, then speculative retry would just skip over the temporary slowpoke.
From: Carl Mueller <carl.muel...@smartthings.com.INVALID> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Thursday, December 5, 2019 at 3:21 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: AWS ephemeral instances + backup Message from External Sender Does anyone have experience tooling written to support this strategy: Use case: run cassandra on i3 instances on ephemerals but synchronize the sstables and commitlog files to the cheapest EBS volume type (those have bad IOPS but decent enough throughput) On node replace, the startup script for the node, back-copies the sstables and commitlog state from the EBS to the ephemeral. As can be seen: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=> the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS volumes presumably) that would incur about a ten minute delay for node replacement for a 1TB node, but I imagine this would only be used on higher IOPS r/w nodes with smaller densities, so 100GB would be about a minute of delay only, already within the timeframes of an AWS node replacement/instance restart.