Have you considered the probability (mean time to failure - not mean time TO failure) of a disk, then factor the probability is 12 times as likely with a raid 0? Then compare that the the time to replicate in degraded mode where you have such a large number of drives on each node?
Secondly, there I have a question about your configuration description: I suspect you are using "software" raid (vs. hardware raid controllers), yes? If you are using software raid, you consuming one or two cores to handle the raid striping calculations for 12 drives, at a guess. Lastly, looking at the vmstats/iostats as a baseline for your question seems like it does not include some other aspects of hdfs: As hdfs is multithreading the writes, across all the devices, across all the nodes, vs a large single device, and that parallelization means more parallel IO queues, it seems to me that your question is a bit simplistic. No? *.......* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Sat, Jul 30, 2016 at 8:12 PM, Shady Xu <shad...@gmail.com> wrote: > Thanks Andrew, I know about the disk failure risk and that it's one of the > reasons why we should use JBOD. But JBOD provides worse performance than > RAID 0. And take into account the fact that HDFS does have other > replications and it will make one more replication on another DataNode when > disk failure happens. So why should we sacrifice performance to prevent > data loss which can naturally be avoided by HDFS? > > 2016-07-31 0:36 GMT+08:00 Andrew Wright <agwli...@gmail.com>: > >> Yes you are. >> >> If you loose any one of your disks with a raid 0 spanning all drive you >> will loose all the data in that directory. >> >> And disks do die. >> >> Yes you get better single threaded performance but are putting that >> entire directory/data set at higher risk >> >> Cheers >> >> >> On Saturday, July 30, 2016, Shady Xu <shad...@gmail.com> wrote: >> >>> Hi, >>> >>> It's widely known that we should mount disks to different directory >>> without any RAID configurations because it provides the best io performance. >>> >>> However, lately I have done some tests with three different >>> configurations and found this may not be the truth. Below are the >>> configurations and statistics shown by command 'iostat -x'. >>> >>> Configuration A: RAID 0 all 12 disks to one directory >>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>> sdb 0.01 0.59 112.02 65.92 15040.07 15856.86 >>> 347.27 0.32 1.81 2.36 0.86 0.93 16.49 >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Configuration B: No RAID at all >>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>> sdc 0.01 0.12 2.88 5.23 364.54 1247.10 >>> 397.52 0.76 93.80 9.05 140.42 2.44 1.98 >>> sdg 0.01 0.07 2.39 5.27 328.72 1246.51 >>> 410.93 0.75 97.88 10.93 137.33 2.63 2.02 >>> sdl 0.01 0.07 2.59 5.46 340.61 1299.00 >>> 407.00 0.82 102.18 9.64 146.09 2.55 2.05 >>> sdf 0.01 0.11 2.28 5.02 291.48 1197.00 >>> 407.99 0.72 99.23 9.15 140.12 2.62 1.91 >>> sdb 0.01 0.07 2.69 5.23 334.19 1238.20 >>> 396.99 0.74 93.84 8.10 137.98 2.41 1.91 >>> sde 0.01 0.11 2.81 5.27 376.54 1262.25 >>> 405.56 0.79 97.62 10.96 143.84 2.58 2.08 >>> sdk 0.01 0.12 3.02 5.20 371.92 1244.48 >>> 392.93 0.79 96.07 8.63 146.85 2.48 2.04 >>> sda 0.00 0.07 2.82 5.33 370.06 1260.68 >>> 400.52 0.78 96.09 9.72 141.74 2.49 2.03 >>> sdi 0.01 0.11 3.09 5.30 378.19 1269.98 >>> 392.63 0.78 92.47 5.98 142.88 2.31 1.94 >>> sdj 0.01 0.07 3.04 5.02 365.32 1185.24 >>> 385.01 0.74 92.22 6.31 144.29 2.40 1.93 >>> sdh 0.01 0.07 2.74 5.34 356.22 1264.28 >>> 401.06 0.78 96.81 11.36 140.75 2.55 2.06 >>> sdd 0.01 0.11 2.47 5.39 343.22 1292.23 >>> 416.20 0.76 96.48 10.26 135.96 2.54 1.99 >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Configuration C: RAID 0 each 12 disks to 12 different directories >>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>> sdd 0.00 0.10 8.88 7.42 1067.65 1761.12 >>> 346.94 0.13 7.94 3.64 13.09 0.46 0.75 >>> sdb 0.00 0.09 8.83 7.52 1066.16 1784.79 >>> 348.65 0.13 8.02 3.75 13.02 0.47 0.76 >>> sdc 0.00 0.10 8.82 7.48 1073.74 1776.02 >>> 349.61 0.13 8.09 3.76 13.19 0.47 0.76 >>> sde 0.00 0.10 8.74 7.46 1060.79 1771.46 >>> 349.63 0.13 7.80 3.53 12.81 0.45 0.73 >>> sdg 0.00 0.10 8.93 7.46 1101.14 1772.73 >>> 350.64 0.13 7.81 3.70 12.71 0.47 0.77 >>> sdf 0.00 0.09 8.75 7.46 1062.06 1772.08 >>> 349.73 0.13 8.03 3.78 13.00 0.46 0.75 >>> sdh 0.00 0.10 9.09 7.45 1114.94 1770.07 >>> 348.76 0.13 7.83 3.69 12.89 0.47 0.77 >>> sdi 0.00 0.10 8.91 7.43 1086.85 1761.30 >>> 348.48 0.13 7.93 3.64 13.07 0.46 0.75 >>> sdj 0.00 0.10 9.04 7.46 1111.32 1768.79 >>> 349.15 0.13 7.79 3.64 12.82 0.46 0.76 >>> sdk 0.00 0.10 9.12 7.51 1122.00 1783.41 >>> 349.49 0.13 7.82 3.72 12.80 0.48 0.79 >>> sdl 0.00 0.10 8.91 7.49 1087.98 1777.77 >>> 349.49 0.13 7.89 3.69 12.89 0.46 0.75 >>> sdm 0.00 0.09 8.97 7.52 1098.82 1787.10 >>> 349.95 0.13 7.96 3.79 12.94 0.47 0.78 >>> >>> It seems the raid 0 all disks to one directory configuration is the one >>> which provides the best disk performance and the no raid configuration >>> provides the worst performance. That's a total opposite fact comparing to >>> the widely known one. Am I doing anything wrong? >>> >> >