If you specify a list in the property "dfs.datanode.data.dir" hadoop will distribute the data blocks among all those disks; it will not replicate data between them. If you want to use the disks as a single one you gotta make a LVM array or any other solution to present them as a single one to the OS.

However, benchmarks prove that specifying a list of disks and letting hadoop distribute data among them gives better performance.

On 13/05/14 17:12, Marcos Sousa wrote:
Yes,

I don't want to replicate, just use as one disk? Isn't possible to make this work?

Best regards,

Marcos


On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari <rahulchaudhari0...@gmail.com <mailto:rahulchaudhari0...@gmail.com>> wrote:

    Marcos,
        While configuring hadoop, the "dfs.datanode.data.dir" property
    in hdfs-default.xml should have this list of disks specified on
    separate line. If you specific comma separated list, it will
    replicate on all those disks/partitions.

    _Rahul
    Sent from my iPad

    > On 13-May-2014, at 12:22 am, Marcos Sousa
    <falecom...@marcossousa.com <mailto:falecom...@marcossousa.com>>
    wrote:
    >
    > Hi,
    >
    > I have 20 servers with 10 HD with 400GB SATA. I'd like to use
    them to be my datanode:
    >
    > /vol1/hadoop/data
    > /vol2/hadoop/data
    > /vol3/hadoop/data
    > /volN/hadoop/data
    >
    > How do user those distinct discs not to replicate?
    >
    > Best regards,
    >
    > --
    > Marcos Sousa




--
Marcos Sousa
www.marcossousa.com <http://www.marcossousa.com> Enjoy it!

--
*Aitor PĂ©rez*
/Big Data System Engineer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_

Reply via email to