Re: Mutiple dfs.data.dir vs RAID0

2013-02-11 Thread Jean-Marc Spaggiari
thanks all for your feebacks. I have updated with hdfs config to add another dfs.data.dir entry and restarted the node. Hadoop is starting to use the entry, but is not spreading the existing data over the 2 directories. Let's say you have a 2TB disk on /hadoop1, almost full. If you add another

Re: Mutiple dfs.data.dir vs RAID0

2013-02-11 Thread Michael Katzenellenbogen
On Mon, Feb 11, 2013 at 10:54 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: thanks all for your feebacks. I have updated with hdfs config to add another dfs.data.dir entry and restarted the node. Hadoop is starting to use the entry, but is not spreading the existing data over the 2

Re: Mutiple dfs.data.dir vs RAID0

2013-02-11 Thread Olivier Renault
You can also rebalance the disk using the steps describe in the FAQ http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F Olivier On 11 February 2013 15:54, Jean-Marc Spaggiari jean-m...@spaggiari.orgwrote: thanks all for your feebacks.

Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Jean-Marc Spaggiari
Hi, I have a quick question regarding RAID0 performances vs multiple dfs.data.dir entries. Let's say I have 2 x 2TB drives. I can configure them as 2 separate drives mounted on 2 folders and assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives with RAID0 and assigned them as a

Re: Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Michael Katzenellenbogen
One thought comes to mind: disk failure. In the event a disk goes bad, then with RAID0, you just lost your entire array. With JBOD, you lost one disk. -Michael On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, I have a quick question regarding RAID0

Re: Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Jean-Marc Spaggiari
The issue is that my MB is not doing JBOD :( I have RAID only possible, and I'm fighting for the last 48h and still not able to make it work... That's why I'm thinking about using dfs.data.dir instead. I have 1 drive per node so far and need to move to 2 to reduce WIO. What will be better with

Re: Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Chris Embree
Interesting question. You'd probably need to benchmark to prove it out. I'm not the exact details of how HDFS stripes data, but it should compare pretty well to hardware RAID. Conceptually, HDFS should be able to out perform a RAID solution, since HDFS knows more about the data being written.

Re: Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Michael Katzenellenbogen
Are you able to create multiple RAID0 volumes? Perhaps you can expose each disk as its own RAID0 volume... Not sure why or where LVM comes into the picture here ... LVM is on the software layer and (hopefully) the RAID/JBOD stuff is at the hardware layer (and in the case of HDFS, LVM will only

Re: Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Jean-Marc Spaggiari
@Michael: I have done some tests between RAID0, 1, JBOD and LVM on another server. Results are there: http://www.spaggiari.org/index.php/hbase/hard-drives-performances LVM and JBOD were close, that's why I talked about LVM, since it seems to be pretty close to JBOD performance wyse and can be

Re: Mutiple dfs.data.dir vs RAID0

2013-02-10 Thread Ted Dunning
Typical best practice is to have a separate file system per spindle. If you have a RAID only controller (many are), then you just create one RAID per spindle. The effect is the same. MapR is unusual able to stripe writes over multiple drives organized into a storage pool, but you will not