Balancer not balancing 100%?

2008-05-11 Thread Otis Gospodnetic
Hi, I have 4 identical nodes in a Hadoop cluster (all functioning as DNs). One of the 4 nodes is a new node that I recently added. I ran the balancer a few times and it did move some of the blocks from the other 3 nodes to the new node. However, the 4 nodes are still not 100% balanced

Re: Balancer not balancing 100%?

2008-05-11 Thread Otis Gospodnetic
Oh, and on top of the above, I just observed that even though bin/hadoop balancer exits immediately and reports the cluster is fully balanced, I do see *very* few blocks (1-2 blocks per node) getting moved every time I run balancer. It feels as if the balancer does actually find some blocks

Problems saving to s3 using distcp

2008-05-11 Thread James Moore
Just upgraded to 0.16.4, and tried a distcp to s3. I'm seeing many errors - according to the jobtracker, 8,008 files were copied, but 5,880 were skipped. I'm assume that the number of skipped files needs to be 0 for a successful copy. And 56 maps failed (log file given below). Is there

HDFS corrupt...how to proceed?

2008-05-11 Thread C G
Hi All: We had a primary node failure over the weekend. When we brought the node back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure how best to proceed. Any advice is greatly appreciated. If I've missed a Wiki page or documentation somewhere please feel free

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread Dhruba Borthakur
Did one datanode fail or did the namenode fail? By fail do you mean that the system was rebooted or was there a bad disk that caused the problem? thanks, dhruba On Sun, May 11, 2008 at 7:23 PM, C G [EMAIL PROTECTED] wrote: Hi All: We had a primary node failure over the weekend. When we

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-11 Thread Dhruba Borthakur
You bring up an interesting point. A big chunk of the code in the Namenode is being done inside a global lock although there are pieces (e.g. a portion of code that chooses datanodes for a newly allocated block) that do execute outside this lock. But, it is probably the case that the namenode does

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread C G
The system hosting the namenode experienced an OS panic and shut down, we subsequently rebooted it. Currently we don't believe there is/was a bad disk or other hardware problem. Something interesting: I've ran fsck twice, the first time it gave the result I posted. The second time I

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread Dhruba Borthakur
Is it possible that new files were being created by running applications between the first and second fsck runs? thans, dhruba On Sun, May 11, 2008 at 8:55 PM, C G [EMAIL PROTECTED] wrote: The system hosting the namenode experienced an OS panic and shut down, we subsequently rebooted it.

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread C G
Yes, several of our logging apps had accumulated backlogs of data and were eager to write to HDFS Dhruba Borthakur [EMAIL PROTECTED] wrote: Is it possible that new files were being created by running applications between the first and second fsck runs? thans, dhruba On Sun, May 11, 2008