If you specify a list in the property "dfs.datanode.data.dir" hadoop
will distribute the data blocks among all those disks; it will not
replicate data between them. If you want to use the disks as a single
one you gotta make a LVM array or any other solution to present them as
a single one to the OS.
However, benchmarks prove that specifying a list of disks and letting
hadoop distribute data among them gives better performance.
On 13/05/14 17:12, Marcos Sousa wrote:
Yes,
I don't want to replicate, just use as one disk? Isn't possible to
make this work?
Best regards,
Marcos
On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari
<rahulchaudhari0...@gmail.com <mailto:rahulchaudhari0...@gmail.com>>
wrote:
Marcos,
While configuring hadoop, the "dfs.datanode.data.dir" property
in hdfs-default.xml should have this list of disks specified on
separate line. If you specific comma separated list, it will
replicate on all those disks/partitions.
_Rahul
Sent from my iPad
> On 13-May-2014, at 12:22 am, Marcos Sousa
<falecom...@marcossousa.com <mailto:falecom...@marcossousa.com>>
wrote:
>
> Hi,
>
> I have 20 servers with 10 HD with 400GB SATA. I'd like to use
them to be my datanode:
>
> /vol1/hadoop/data
> /vol2/hadoop/data
> /vol3/hadoop/data
> /volN/hadoop/data
>
> How do user those distinct discs not to replicate?
>
> Best regards,
>
> --
> Marcos Sousa
--
Marcos Sousa
www.marcossousa.com <http://www.marcossousa.com> Enjoy it!
--
*Aitor PĂ©rez*
/Big Data System Engineer/
Telf.: +34 917 680 490
Fax: +34 913 833 301
C/Manuel Tovar, 49-53 - 28034 Madrid - Spain
_http://www.bidoop.es_