Ok. I see. For my usecase I prefer to loose the data and have faster process. So I will go for RAID0 and keep the replication factor to 3... If at some point I have 5 disks in the node, I will most probably give a try to RAID5 and see the performances compared to the other RAID/JBOD options.
Is there a "rule", like, 1 HD per core? Or we can't really simplify that much? So far I have that in the sar output: 21:35:03 tps rtps wtps bread/s bwrtn/s 21:45:03 218,85 215,97 2,88 45441,95 308,04 21:55:02 209,73 206,67 3,06 43985,28 378,32 22:05:04 215,03 211,71 3,33 44831,00 312,95 Average : 214,54 211,45 3,09 44753,09 333,07 But I'm not sure what it means. I will wait for tomorrow to get more results, but my job will be done over night, so I'm not sure the average will be accurate... JM 2013/2/7, Kevin O'dell <kevin.od...@cloudera.com>: > JM, > > I think you misunderstood me. I am not advocating any form of RAID for > Hadoop. It is true that we already have redundancy built in with HDFS. So > unless you were going to do something silly like sacrifice speed to run > RAID1 or RAID5 and lower your replication to 2...just don't do it :) > Anyway, yes you probably should have 3 - 4 drives per node if not more. > At that point then the you will really see the benefit of JBOD over RAID0 > > Do you want to be able to lose a drive and keep the node up? If yes, then > JBOD is for you. Do you not care if you lose that node due to drive > failure? You just need speed, then RAID0 may be the correct choice. Sar > will take some time to populate. Give it about 24 hours and you should be > able to glean some interesting information. > > On Thu, Feb 7, 2013 at 9:50 PM, Jean-Marc Spaggiari > <jean-m...@spaggiari.org >> wrote: > >> Ok. I see with RAID0 might be better for me compare to JBOD. Also, why >> do we want to use RAID1 or RAID5? We already have the redundancy done >> by hadoop, is it not going to add another non-required level of >> redundancy? >> >> Should I already think to have 3 or even 4 drives in each node? >> >> I tried sar -A and it's only giving me 2 lines. >> root@node7:/home/hbase# sar -A >> Linux 3.2.0-4-amd64 (node7) 2013-02-07 _x86_64_ (4 CPU) >> >> 21:29:54 LINUX RESTART >> >> It was not enabled, so I just enabled it and restart sysstat, but >> seems that it's still not populated. >> >> I have the diskstats plugin installed on ganglia, so I have a LOT of >> disks information, but not this specific one. >> >> My write_bytes_per_sec is pretty low. Average is 232K for the last 2 >> hours. But my erad_bytes_per_sec is avera 22.83M for the same period. >> The graph is looking like a comb. >> >> I just retried sar and some data is coming.. I will need to let it run >> for few more minutes to get some more data ... >> >> JM >> >> >> 2013/2/7, Kevin O'dell <kevin.od...@cloudera.com>: >> > JM, >> > >> > Okay, I think I see what was happening. You currently only have one >> > drive in the system that is showing High I/O wait correct? You are >> looking >> > at bringing in a second drive to help distribute the load? In your >> testing >> > with two drives you saw that RAID0 offerred superior performance vs >> > JBOD. >> > Typically when we see RAID vs JBOD we are dealing with about 6 - 12 >> > drives. Here are some of the pluses and minuses: >> > >> > RAID0 - faster performance since the data is striped, but you are as >> > fast >> > as your slowest drive and one drive failure you lose the whole volume. >> > >> > JBOD - Better redundancy and faster than a RAID1, or a RAID5 >> > configuration(unsure about a RAID4), but you are slower than RAID0 >> > >> > It sounds like since you only have 1 drive in the node right now, you >> > wouldn't be gaining or losing any redundancy by going with RAID0. For >> what >> > it is worth, I would agree that you are I/O bound. If you run a sar -A >> > > >> > /tmp/sar.out and you take a look at the drive utilization what is your >> > TPS(IOPs) count that you are seeing? >> > >> > On Thu, Feb 7, 2013 at 9:00 PM, Jean-Marc Spaggiari >> > <jean-m...@spaggiari.org >> >> wrote: >> > >> >> Hi Kevin, >> >> >> >> I'm facing some issues on one of my nodes and I'm trying to find a way >> >> to fix that. CPU is used about 10% by user, and 80% for WIO. So I'm >> >> looking for a way to improve that. The mother board can do RAIDx and >> >> JBOD too. It's the server I used few weeks ago to run some disks >> >> benchs. >> >> >> >> http://www.spaggiari.org/index.php/hbase/hard-drives-performances >> >> >> >> The conclusion was that RAID0 was 70% faster than JBOD. But JBOD was >> >> faster than RAID1. >> >> >> >> I have a 2TB drive in this server and was thinking about just adding >> >> another 2TB drive. >> >> >> >> What are the advantages of JBOD compared to RAID0? From the last tests >> >> I did, it was slower. >> >> >> >> Since I will have to re-format the disks anyway, I can re-run the >> >> tests just in case I did not configured something properly.... >> >> >> >> JM >> >> >> >> 2013/2/7, Kevin O'dell <kevin.od...@cloudera.com>: >> >> > Hey JM, >> >> > >> >> > Why RAID0? That has a lot of disadvantages to using a JBOD >> >> > configuration? Wait I/O is a symptom, not a problem. Are you >> actually >> >> > experiencing a problem or are you treating for something you think >> >> > should >> >> > be lower? >> >> > >> >> > On Thu, Feb 7, 2013 at 8:19 PM, Jean-Marc Spaggiari >> >> > <jean-m...@spaggiari.org >> >> >> wrote: >> >> > >> >> >> Hi, >> >> >> >> >> >> What is an acceptable CPU_WIO % while running an heavy MR job? >> >> >> Should >> >> >> we also try to keep that under 10%? Or it's not realistic and we >> >> >> will >> >> >> see more about 50%? >> >> >> >> >> >> One of my nodes is showing 70% :( It's WAY to much. I will add >> another >> >> >> disk tomorrow and put them in RAID0, but I'm wondering how low >> >> >> shoud >> I >> >> >> go? >> >> >> >> >> >> JM >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Kevin O'Dell >> >> > Customer Operations Engineer, Cloudera >> >> > >> >> >> > >> > >> > >> > -- >> > Kevin O'Dell >> > Customer Operations Engineer, Cloudera >> > >> > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera >