This is a very non-typical node for hadoop clusters. 8 cores is not that
uncommon, but normally nodes have only 2 disks. The rationale for a small
number of spindles per machine is that total disk bandwidth scales with
number of busses and controllers rather than number of spindles. The same
thi
Yes, it was 1 MB/s (I was foolish), and it shouldn't affect the result.
Also, log for 4th node says it wasn't properly working as a datanode,
and regardless of replication factor, data is replicated to only 3 of 4
nodes (I was foolish again :-().
My single node has 4 dual core cpus, and 24 hard d
On Fri, Jul 17, 2009 at 4:16 PM, Seunghwa Kang wrote:
> I checked with
>
> bin/hadoop fs -stat "%n %r" input/*
>
> part-0 4
> part-1 4
> part-2 4
> part-3 4
> part-4 4
> part-5 4
> part-6 4
> part-7 4
>
> and see replication factor is 4.
>
> Also, I set replication
I checked with
bin/hadoop fs -stat "%n %r" input/*
part-0 4
part-1 4
part-2 4
part-3 4
part-4 4
part-5 4
part-6 4
part-7 4
and see replication factor is 4.
Also, I set replication factor to 4 in hadoop-site.xml, run stop-all.sh
and start-all.sh, re-load the data,
Does [hadoop fs -fsck /] show any under-replicated files/blocks? you may
not waited long enough after increasing the target replication rate.
Another thing to watch out for in a production node is the distribution of
node blocks. You should be careful to load data from outside the cluster to
ens
Hi Suenghwa,
It's important to note that changing the dfs.replication config variable
does not change the current files in HDFS. You have to use fs -setrep on
those files to change their replication count. The replication count is set
when the files were created and not modified thereafter unless
I found I forgot to mention my hadoop version.
I am using 0.19.1.
Thanks again,
-seunghwa
On Fri, 2009-07-17 at 18:57 -0400, Seunghwa Kang wrote:
> Hello,
>
> I am running Hadoop on my 4 nodes system.
>
> Initially, I pick the replication factor of 2, and nearly 100% of map
> tasks run in loc
Hello,
I am running Hadoop on my 4 nodes system.
Initially, I pick the replication factor of 2, and nearly 100% of map
tasks run in local up to 3 nodes, but the ratio drops to 80% if I use
all 4 nodes.
As my nodes have quite high I/O bandwidth (24 disks per node), but only
limited network bandwi