Re: [lopsa-tech] When is RAID Not Enough? - Cloud News

Doug Hughes Thu, 20 Feb 2014 19:01:28 -0800

On 2/19/2014 4:52 PM, Andrew Hume wrote:

an almost counter-intuitive finding.


http://cloud.media.seagate.com/2014/02/18/when-is-my-data-too-big-for-a-raid-storage-solution/?utm_source=linkedin&utm_medium=social&utm_content=Oktopost-LinkedIn-Group&utm_campaign=%28Oktopost%29Feb+2014

This article ignores (purposefully? unintentionally?) some currenttrends that are highly relevant.

1) the trend towards decoupled RAID. There is nothing that says that youhave to RAID across a full drive. Recent trends (Isilon, GPFS GNR)divide the disks into chunks and 'protect' (Reed solomon, XOR, erasure,or other encoding) across them. When a disk fails, those chunks getrebuilt in parallel across <n> other disks. Thus, single disk failuresare not proportional to the size of the disk, and the size of the diskis irrelevant. 5-10 minutes to rebuild all of the parallel chunks thathappen to have been on that disk is typical. There's no gain in overheadbecause the overall amount of data protection is the same.

2) the trend towards data verification that doesn't wait until a drivefails to determine whether data blocks are subject to a simultaneousdisk failure elsewhere. This includes ZFS and BTRfs integrity checksumsplus scrubbing, Erasure coding for GPFS et all, and N+M raid. By doingperiodic scrubs, you don't wait until the single failure to experiencethe double. You proactively read back all of the blocks and remapsuspect ones early. (You also get the benefit of finding disk-bound bitflips if you have that capability. It does happen. We get 1-2 per yearover 1PB of disk)

So, to some extent they create a strawman to knock it down, but it'ssuspiciously unaware of the state of the art.

Replication, to me, solves a slightly different problem. It's bigapplication is protection against infrastructure loss (datacenter powerloss, network loss, fire, etc.). It doesn't save you from bit flipswithout something else. It DEFINITELY doesn't protect you from the GIGOproblem. (garbage replicates just as effectively as data). Also, itdoesn't weed out disks that haven't been read for a while as scrubbingwould.

What happens in replication when your primary datacenter goes offline?All of a sudden there's a big uptick in reads from your secondary and"oh no", you have a disk failure from the load spike (the classic singledisk raid scenario). You still want RAID on your replicated data becauseyou still have the same problems (or lack of problems wrt decoupled,erasure, scrubbing etc.)


yes, on the surface, RAID is not enough. But to paraphrase the old joke:
patient: "hey doc, it hurts when I close the door on my foot"
doc: "well, don't do that".



_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-tech] When is RAID Not Enough? - Cloud News

Reply via email to