Ok. I see. For my usecase I prefer to loose the data and have faster
process. So I will go for RAID0 and keep the replication factor to
3... If at some point I have 5 disks in the node, I will most probably
give a try to RAID5 and see the performances compared to the other
RAID/JBOD options.

Is there a "rule", like, 1 HD per core? Or we can't really simplify that much?

So far I have that in the sar output:
21:35:03          tps      rtps      wtps   bread/s   bwrtn/s
21:45:03       218,85    215,97      2,88  45441,95    308,04
21:55:02       209,73    206,67      3,06  43985,28    378,32
22:05:04       215,03    211,71      3,33  44831,00    312,95
Average :      214,54    211,45      3,09  44753,09    333,07

But I'm not sure what it means. I will wait for tomorrow to get more
results, but my job will be done over night, so I'm not sure the
average will be accurate...

JM


2013/2/7, Kevin O'dell <kevin.od...@cloudera.com>:
> JM,
>
>   I think you misunderstood me.  I am not advocating any form of RAID for
> Hadoop.  It is true that we already have redundancy built in with HDFS.  So
> unless you were going to do something silly like sacrifice speed to run
> RAID1 or RAID5 and lower your replication to 2...just don't do it :)
>  Anyway, yes you probably should have 3 - 4 drives per node if not more.
>  At that point then the you will really see the benefit of JBOD over RAID0
>
> Do you want to be able to lose a drive and keep the node up?  If yes, then
> JBOD is for you.  Do you not care if you lose that node due to drive
> failure? You just need speed, then RAID0 may be the correct choice.  Sar
> will take some time to populate.  Give it about 24 hours and you should be
> able to glean some interesting information.
>
> On Thu, Feb 7, 2013 at 9:50 PM, Jean-Marc Spaggiari
> <jean-m...@spaggiari.org
>> wrote:
>
>> Ok. I see with RAID0 might be better for me compare to JBOD. Also, why
>> do we want to use RAID1 or RAID5? We already have the redundancy done
>> by hadoop, is it not going to add another non-required level of
>> redundancy?
>>
>> Should I already think to have 3 or even 4 drives in each node?
>>
>> I tried sar -A and it's only giving me 2 lines.
>> root@node7:/home/hbase# sar -A
>> Linux 3.2.0-4-amd64 (node7)     2013-02-07      _x86_64_        (4 CPU)
>>
>> 21:29:54          LINUX RESTART
>>
>> It was not enabled, so I just enabled it and restart sysstat, but
>> seems that it's still not populated.
>>
>> I have the diskstats plugin installed on ganglia, so I have a LOT of
>> disks information, but not this specific one.
>>
>> My write_bytes_per_sec is pretty low. Average is 232K for the last 2
>> hours. But my erad_bytes_per_sec is avera 22.83M for the same period.
>> The graph is looking like a comb.
>>
>> I just retried sar and some data is coming.. I will need to let it run
>> for few more minutes to get some more data ...
>>
>> JM
>>
>>
>> 2013/2/7, Kevin O'dell <kevin.od...@cloudera.com>:
>> > JM,
>> >
>> >   Okay, I think I see what was happening.  You currently only have one
>> > drive in the system that is showing High I/O wait correct?  You are
>> looking
>> > at bringing in a second drive to help distribute the load?  In your
>> testing
>> > with two drives you saw that RAID0 offerred superior performance vs
>> > JBOD.
>> >  Typically when we see RAID vs JBOD we are dealing with about 6 - 12
>> > drives.  Here are some of the pluses and minuses:
>> >
>> > RAID0 - faster performance since the data is striped, but you are as
>> > fast
>> > as your slowest drive and one drive failure you lose the whole volume.
>> >
>> > JBOD - Better redundancy and faster than a RAID1, or a RAID5
>> > configuration(unsure about a RAID4), but you are slower than RAID0
>> >
>> > It sounds like since you only have 1 drive in the node right now, you
>> > wouldn't be gaining or losing any redundancy by going with RAID0.  For
>> what
>> > it is worth, I would agree that you are I/O bound.  If you run a sar -A
>> > >
>> > /tmp/sar.out and you take a look at the drive utilization what is your
>> > TPS(IOPs) count that you are seeing?
>> >
>> > On Thu, Feb 7, 2013 at 9:00 PM, Jean-Marc Spaggiari
>> > <jean-m...@spaggiari.org
>> >> wrote:
>> >
>> >> Hi Kevin,
>> >>
>> >> I'm facing some issues on one of my nodes and I'm trying to find a way
>> >> to fix that. CPU is used about 10% by user, and 80% for WIO. So I'm
>> >> looking for a way to improve that. The mother board can do RAIDx and
>> >> JBOD too. It's the server I used few weeks ago to run some disks
>> >> benchs.
>> >>
>> >> http://www.spaggiari.org/index.php/hbase/hard-drives-performances
>> >>
>> >> The conclusion was that RAID0 was 70% faster than JBOD. But JBOD was
>> >> faster than RAID1.
>> >>
>> >> I have a 2TB drive in this server and was thinking about just adding
>> >> another 2TB drive.
>> >>
>> >> What are the advantages of JBOD compared to RAID0? From the last tests
>> >> I did, it was slower.
>> >>
>> >> Since I will have to re-format the disks anyway, I can re-run the
>> >> tests just in case I did not configured something properly....
>> >>
>> >> JM
>> >>
>> >> 2013/2/7, Kevin O'dell <kevin.od...@cloudera.com>:
>> >> > Hey JM,
>> >> >
>> >> >   Why RAID0?  That has a lot of disadvantages to using a JBOD
>> >> > configuration?  Wait I/O is a symptom, not a problem.  Are you
>> actually
>> >> > experiencing a problem or are you treating for something you think
>> >> > should
>> >> > be lower?
>> >> >
>> >> > On Thu, Feb 7, 2013 at 8:19 PM, Jean-Marc Spaggiari
>> >> > <jean-m...@spaggiari.org
>> >> >> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> What is an acceptable CPU_WIO % while running an heavy MR job?
>> >> >> Should
>> >> >> we also try to keep that under 10%? Or it's not realistic and we
>> >> >> will
>> >> >> see more about 50%?
>> >> >>
>> >> >> One of my nodes is showing 70% :( It's WAY to much. I will add
>> another
>> >> >> disk tomorrow and put them in RAID0, but I'm wondering how low
>> >> >> shoud
>> I
>> >> >> go?
>> >> >>
>> >> >> JM
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Kevin O'Dell
>> >> > Customer Operations Engineer, Cloudera
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Kevin O'Dell
>> > Customer Operations Engineer, Cloudera
>> >
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

Reply via email to