Correction:  “most of your database will be in chunk cache, or buffer cache 
anyways.

From: Reid Pinchback <rpinchb...@tripadvisor.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, December 6, 2019 at 10:16 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: AWS ephemeral instances + backup

Message from External Sender
If you’re only going to have a small storage footprint per node like 100gb, 
another option comes to mind. Use an instance type with large ram.  Use an EBS 
storage volume on an EBS-optimized instance type, and take EBS snapshots. Most 
of your database will be in chunk cache anyways, so you only need to make sure 
that the dirty background writer is keeping up.  I’d take a look at iowait 
during a snapshot and see if the results are acceptable for a running node.  
Even if it is marginal, if you’re only snapshotting one node at a time, then 
speculative retry would just skip over the temporary slowpoke.

From: Carl Mueller <carl.muel...@smartthings.com.INVALID>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, December 5, 2019 at 3:21 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: AWS ephemeral instances + backup

Message from External Sender
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the 
sstables and commitlog files to the cheapest EBS volume type (those have bad 
IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables and 
commitlog state from the EBS to the ephemeral.

As can be seen: 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=>

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS 
volumes presumably) that would incur about a ten minute delay for node 
replacement for a 1TB node, but I imagine this would only be used on higher 
IOPS r/w nodes with smaller densities, so 100GB would be about a minute of 
delay only, already within the timeframes of an AWS node replacement/instance 
restart.


Reply via email to