We have seen in several of our Hadoop clusters that LVM degrades
performance of our M/R jobs, and I remembered a message where
Ted Dunning was explaining something about this, and since
that time, we don't use LVM for Hadoop data directories.

About RAID volumes, the best performance that we have achieved
is using RAID 10 for our Hadoop data directories.


On 02/10/2013 09:24 PM, Michael Katzenellenbogen wrote:
Are you able to create multiple RAID0 volumes? Perhaps you can expose
each disk as its own RAID0 volume...

Not sure why or where LVM comes into the picture here ... LVM is on
the software layer and (hopefully) the RAID/JBOD stuff is at the
hardware layer (and in the case of HDFS, LVM will only add unneeded
overhead).

-Michael

On Feb 10, 2013, at 9:19 PM, Jean-Marc Spaggiari
<jean-m...@spaggiari.org> wrote:

The issue is that my MB is not doing JBOD :( I have RAID only
possible, and I'm fighting for the last 48h and still not able to make
it work... That's why I'm thinking about using dfs.data.dir instead.

I have 1 drive per node so far and need to move to 2 to reduce WIO.

What will be better with JBOD against dfs.data.dir? I have done some
tests JBOD vs LVM and did not find any pros for JBOD so far.

JM

2013/2/10, Michael Katzenellenbogen <mich...@cloudera.com>:
One thought comes to mind: disk failure. In the event a disk goes bad,
then with RAID0, you just lost your entire array. With JBOD, you lost
one disk.

-Michael

On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari
<jean-m...@spaggiari.org> wrote:

Hi,

I have a quick question regarding RAID0 performances vs multiple
dfs.data.dir entries.

Let's say I have 2 x 2TB drives.

I can configure them as 2 separate drives mounted on 2 folders and
assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
with RAID0 and assigned them as a single folder to dfs.data.dir.

With RAID0, the reads and writes are going to be spread over the 2
disks. This is significantly increasing the speed. But if I put 2
entries in dfs.data.dir, hadoop is going to spread over those 2
directories too, and at the end, ths results should the same, no?

Any experience/advice/results to share?

Thanks,

JM

--
Marcos Ortiz Valmaseda,
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>

Reply via email to