Re: EC2 storage options for C*

Jeff Jirsa Mon, 01 Feb 2016 14:14:35 -0800

A lot of people use the old gen instances (m1 in particular) because they came 
with a ton of effectively free ephemeral storage (up to 1.6TB). Whether or not 
they’re viable is a decision for each user to make. They’re very, very commonly 
used for C*, though. At a time when EBS was not sufficiently robust or 
reliable, a cluster of m1 instances was the de facto standard.

The canonical “best practice” in 2015 was i2. We believe we’ve made a 
compelling argument to use m4 or c4 instead of i2. There exists a company we 
know currently testing d2 at scale, though I’m not sure they have much in terms 
of concrete results at this time. 

- Jeff

From:  Jack Krupansky
Reply-To:  "user@cassandra.apache.org"
Date:  Monday, February 1, 2016 at 1:55 PM
To:  "user@cassandra.apache.org"
Subject:  Re: EC2 storage options for C*

Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2 Dense 
Storage". 

The remaining question is whether any of the "Previous Generation Instances" 
should be publicly recommended going forward.

And whether non-SSD instances should be recommended going forward as well. 
sure, technically, someone could use the legacy instances, but the question is 
what we should be recommending as best practice going forward.

Yeah, the i2 instances look like the sweet spot for any non-EBS clusters.

-- Jack Krupansky

On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt <sroben...@highwire.org> wrote:
Hi Jack, 

At the bottom of the instance-types page, there is a link to the previous 
generations, which includes the older series (m1, m2, etc), many of which have 
HDD options. 

There are also the d2 (Dense Storage) instances in the current generation that 
include various combos of local HDDs.

The i2 series has good sized SSDs available, and has the advanced networking 
option, which is also useful for Cassandra. The enhanced networking is 
available with other instance types as well, as you'll see on the feature list 
under each type. 

Steve

On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote:
Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic question, 
it seems like magnetic (HDD) is no longer a recommended storage option for 
databases on AWS. In particular, only the C2 Dense Storage instances have local 
magnetic storage - all the other instance types are SSD or EBS-only - and EBS 
Magnetic is only recommended for "Infrequent Data Access." 

For the record, that AWS doc has Cassandra listed as a use case for i2 instance 
types.

Also, the AWS doc lists EBS io2 for the NoSQL database use case and gp2 only 
for the "small to medium databases" use case.

Do older instances with local HDD still exist on AWS (m1, m2, etc.)? Is the doc 
simply for any newly started instances?

See:
https://aws.amazon.com/ec2/instance-types/
http://aws.amazon.com/ebs/details/

-- Jack Krupansky

On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
> My apologies if my questions are actually answered on the video or slides, I 
> just did a quick scan of the slide text.

Virtually all of them are covered.

> I'm curious where the EBS physical devices actually reside - are they in the 
> same rack, the same data center, same availability zone? I mean, people try 
> to minimize network latency between nodes, so how exactly is EBS able to 
> avoid network latency?

Not published,and probably not a straight forward answer (probably have 
redundancy cross-az, if it matches some of their other published behaviors). 
The promise they give you is ‘iops’, with a certain block size. Some instance 
types are optimized for dedicated, ebs-only network interfaces. Like most 
things in cassandra / cloud, the only way to know for sure is to test it 
yourself and see if observed latency is acceptable (or trust our testing, if 
you assume we’re sufficiently smart and honest). 

> Did your test use Amazon EBS–Optimized Instances?

We tested dozens of instance type/size combinations (literally). The best 
performance was clearly with ebs-optimized instances that also have enhanced 
networking (c4, m4, etc) - slide 43

> SSD or magnetic or does it make any difference?

SSD, GP2 (slide 64)

> What info is available on EBS performance at peak times, when multiple AWS 
> customers have spikes of demand?

Not published, but experiments show that we can hit 10k iops all day every day 
with only trivial noisy neighbor problems, not enough to impact a real cluster 
(slide 58)

> Is RAID much of a factor or help at all using EBS?

You can use RAID to get higher IOPS than you’d normally get by default (GP2 
IOPS cap is 10k, which you get with a 3.333T volume – if you need more than 
10k, you can stripe volumes together up to the ebs network link max) (hinted at 
in slide 64)

> How exactly is EBS provisioned in terms of its own HA - I mean, with a 
> properly configured Cassandra cluster RF provides HA, so what is the 
> equivalent for EBS? If I have RF=3, what assurance is there that those three 
> EBS volumes aren't all in the same physical rack?

There is HA, I’m not sure that AWS publishes specifics. Occasionally specific 
volumes will have issues (hypervisor’s dedicated ethernet link to EBS network 
fails, for example). Occasionally instances will have issues. The 
volume-specific issues seem to be less common than the instance-store “instance 
retired” or “instance is running on degraded hardware” events. Stop/Start and 
you’ve recovered (possible with EBS, not possible with instance store). The 
assurances are in AWS’ SLA – if the SLA is insufficient (and it probably is 
insufficient), use more than one AZ and/or AWS region or cloud vendor.

> For multi-data center operation, what configuration options assure that the 
> EBS volumes for each DC are truly physically separated?

It used to be true that EBS control plane for a given region spanned AZs. 
That’s no longer true. AWS asserts that failure modes for each AZ are isolated 
(data may replicate between AZs, but a full outage in us-east-1a shouldn’t 
affect running ebs volumes in us-east-1b or us-east-1c). Slide 65

> In terms of syncing data for the commit log, if the OS call to sync an EBS 
> volume returns, is the commit log data absolutely 100% synced at the hardware 
> level on the EBS end, such that a power failure of the systems on which the 
> EBS volumes reside will still guarantee availability of the fsynced data. As 
> well, is return from fsync an absolute guarantee of sstable durability when 
> Cassandra is about to delete the commit log, including when the two are on 
> different volumes? In practice, we would like some significant degree of 
> pipelining of data, such as during the full processing of flushing memtables, 
> but for the fsync at the end a solid guarantee is needed.

Most of the answers in this block are “probably not 100%, you should be writing 
to more than one host/AZ/DC/vendor to protect your organization from failures”. 
AWS targets something like 0.1% annual failure rate per volume and 99.999% 
availability (slide 66). We believe they’re exceeding those goals (at least 
based with the petabytes of data we have on gp2 volumes).  

From: Jack Krupansky
Reply-To: "user@cassandra.apache.org"
Date: Monday, February 1, 2016 at 5:51 AM 

To: "user@cassandra.apache.org"
Subject: Re: EC2 storage options for C*

I'm not a fan of guy - this appears to be the slideshare corresponding to the 
video: 
http://www.slideshare.net/AmazonWebServices/bdt323-amazon-ebs-cassandra-1-million-writes-per-second

My apologies if my questions are actually answered on the video or slides, I 
just did a quick scan of the slide text.

I'm curious where the EBS physical devices actually reside - are they in the 
same rack, the same data center, same availability zone? I mean, people try to 
minimize network latency between nodes, so how exactly is EBS able to avoid 
network latency? 

Did your test use Amazon EBS–Optimized Instances?

SSD or magnetic or does it make any difference?

What info is available on EBS performance at peak times, when multiple AWS 
customers have spikes of demand?

Is RAID much of a factor or help at all using EBS?

How exactly is EBS provisioned in terms of its own HA - I mean, with a properly 
configured Cassandra cluster RF provides HA, so what is the equivalent for EBS? 
If I have RF=3, what assurance is there that those three EBS volumes aren't all 
in the same physical rack?

For multi-data center operation, what configuration options assure that the EBS 
volumes for each DC are truly physically separated?

In terms of syncing data for the commit log, if the OS call to sync an EBS 
volume returns, is the commit log data absolutely 100% synced at the hardware 
level on the EBS end, such that a power failure of the systems on which the EBS 
volumes reside will still guarantee availability of the fsynced data. As well, 
is return from fsync an absolute guarantee of sstable durability when Cassandra 
is about to delete the commit log, including when the two are on different 
volumes? In practice, we would like some significant degree of pipelining of 
data, such as during the full processing of flushing memtables, but for the 
fsync at the end a solid guarantee is needed.

-- Jack Krupansky

On Mon, Feb 1, 2016 at 12:56 AM, Eric Plowe <eric.pl...@gmail.com> wrote:
Jeff, 

If EBS goes down, then EBS Gp2 will go down as well, no? I'm not discounting 
EBS, but prior outages are worrisome. 

On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
Free to choose what you'd like, but EBS outages were also addressed in that 
video (second half, discussion by Dennis Opacki). 2016 EBS isn't the same as 
2011 EBS. 

-- 
Jeff Jirsa

On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.pl...@gmail.com> wrote:

Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2 after 
testing is a viable contender for our workload. The only worry I have is EBS 
outages, which have happened. 

On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
Also in that video - it's long but worth watching

We tested up to 1M reads/second as well, blowing out page cache to ensure we 
weren't "just" reading from memory

-- 
Jeff Jirsa

On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote:

How about reads? Any differences between read-intensive and write-intensive 
workloads?

-- Jack Krupansky

On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
Hi John,

We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes per 
second on 60 nodes, we didn’t come close to hitting even 50% utilization (10k 
is more than enough for most workloads). PIOPS is not necessary. 

From: John Wong
Reply-To: "user@cassandra.apache.org"
Date: Saturday, January 30, 2016 at 3:07 PM
To: "user@cassandra.apache.org"
Subject: Re: EC2 storage options for C*

For production I'd stick with ephemeral disks (aka instance storage) if you 
have running a lot of transaction. 
However, for regular small testing/qa cluster, or something you know you want 
to reload often, EBS is definitely good enough and we haven't had issues 99%. 
The 1% is kind of anomaly where we have flush blocked.

But Jeff, kudo that you are able to use EBS. I didn't go through the video, do 
you actually use PIOPS or just standard GP2 in your production cluster?

On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <br...@blockcypher.com> wrote:
Yep, that motivated my question "Do you have any idea what kind of disk 
performance you need?". If you need the performance, its hard to beat ephemeral 
SSD in RAID 0 on EC2, and its a solid, battle tested configuration. If you 
don't, though, EBS GP2 will save a _lot_ of headache.

Personally, on small clusters like ours (12 nodes), we've found our choice of 
instance dictated much more by the balance of price, CPU, and memory. We're 
using GP2 SSD and we find that for our patterns the disk is rarely the 
bottleneck. YMMV, of course.

On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
If you have to ask that question, I strongly recommend m4 or c4 instances with 
GP2 EBS.  When you don’t care about replacing a node because of an instance 
failure, go with i2+ephemerals. Until then, GP2 EBS is capable of amazing 
things, and greatly simplifies life.

We gave a talk on this topic at both Cassandra Summit and AWS re:Invent: 
https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much a viable option, 
despite any old documents online that say otherwise.

From: Eric Plowe
Reply-To: "user@cassandra.apache.org"
Date: Friday, January 29, 2016 at 4:33 PM
To: "user@cassandra.apache.org"
Subject: EC2 storage options for C*

My company is planning on rolling out a C* cluster in EC2. We are thinking 
about going with ephemeral SSDs. The question is this: Should we put two in 
RAID 0 or just go with one? We currently run a cluster in our data center with 
2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the performance we 
are seeing thus far.

Thanks!

Eric

-- 
Steve Robenalt 
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication

smime.p7s
Description: S/MIME cryptographic signature

Re: EC2 storage options for C*

Reply via email to