Moving this to user@ since it's not appropriate for general@. On Fri, Sep 28, 2012 at 11:16 PM, Xiang Hua <bea...@gmail.com> wrote: > Hi, > i want to select 4(600G) local disks combined with 3*800G disks form > diskarray in one datanode. > is there any problem? performance ?
The recommended configuration would be to partition and format each disk with ext4, then set dfs.datanode.data.dir to point to the mountpoints of each disk: <property> <name>dfs.datanode.data.dir</name> <value>/data/1/datadir,/data/2/datadir,/data/3/datadir</value> </property> You may also want to set dfs.datanode.du.reserved to 1GB or thereabouts. With this configuration your DN will fill all 7 datadir at the same rate pseudorandomly, until the 600G disks are nearly full, then it will write any further blocks to the 800G disks. Performance will be OK except that you will see performance hot-spots on the larger disks when writing past the 600GB mark. See https://issues.apache.org/jira/browse/HDFS-1564 for one missing feature in this area. I would not recommend using RAID-0 for datadir because if you experience a disk failure with independent filesystems, only the blocks on one datadir are lost and need to be rereplicated. If you experience a disk failure with RAID-0, all blocks stored on that DN are lost and need to be rereplicated. Also, RAID results in performance lockstep; a single slow disk will slow down access to all blocks on that DN, while with independent filesystems a single slow disk slows down only a fraction of the blocks on that DN. -andy