Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Ted Dunning
This is a very non-typical node for hadoop clusters. 8 cores is not that uncommon, but normally nodes have only 2 disks. The rationale for a small number of spindles per machine is that total disk bandwidth scales with number of busses and controllers rather than number of spindles. The same thi

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Seunghwa Kang
Yes, it was 1 MB/s (I was foolish), and it shouldn't affect the result. Also, log for 4th node says it wasn't properly working as a datanode, and regardless of replication factor, data is replicated to only 3 of 4 nodes (I was foolish again :-(). My single node has 4 dual core cpus, and 24 hard d

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Todd Lipcon
On Fri, Jul 17, 2009 at 4:16 PM, Seunghwa Kang wrote: > I checked with > > bin/hadoop fs -stat "%n %r" input/* > > part-0 4 > part-1 4 > part-2 4 > part-3 4 > part-4 4 > part-5 4 > part-6 4 > part-7 4 > > and see replication factor is 4. > > Also, I set replication

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Seunghwa Kang
I checked with bin/hadoop fs -stat "%n %r" input/* part-0 4 part-1 4 part-2 4 part-3 4 part-4 4 part-5 4 part-6 4 part-7 4 and see replication factor is 4. Also, I set replication factor to 4 in hadoop-site.xml, run stop-all.sh and start-all.sh, re-load the data,

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Ted Dunning
Does [hadoop fs -fsck /] show any under-replicated files/blocks? you may not waited long enough after increasing the target replication rate. Another thing to watch out for in a production node is the distribution of node blocks. You should be careful to load data from outside the cluster to ens

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Todd Lipcon
Hi Suenghwa, It's important to note that changing the dfs.replication config variable does not change the current files in HDFS. You have to use fs -setrep on those files to change their replication count. The replication count is set when the files were created and not modified thereafter unless

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Seunghwa Kang
I found I forgot to mention my hadoop version. I am using 0.19.1. Thanks again, -seunghwa On Fri, 2009-07-17 at 18:57 -0400, Seunghwa Kang wrote: > Hello, > > I am running Hadoop on my 4 nodes system. > > Initially, I pick the replication factor of 2, and nearly 100% of map > tasks run in loc

Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Seunghwa Kang
Hello, I am running Hadoop on my 4 nodes system. Initially, I pick the replication factor of 2, and nearly 100% of map tasks run in local up to 3 nodes, but the ratio drops to 80% if I use all 4 nodes. As my nodes have quite high I/O bandwidth (24 disks per node), but only limited network bandwi