Re: what is dfs.fileSystem.TotalLoad?

2009-12-24 Thread Edward Capriolo
Fu-Ming, While some JMX attributes are self explanatory, most could benefit from a one line long description and a paragraph long description. Usually your best bet is to grep the hadoop source code for the attribute and try to understand what it is. From the name I just assumed this was a measur

Re: Secondary NameNodes or NFS exports?

2009-12-24 Thread Jason Venner
In my test case, the checkpoints take a small number of seconds or less. On Thu, Dec 24, 2009 at 10:34 AM, Todd Lipcon wrote: > How long does the checkpoint take? It seems possible to me that if the 2NN > checkpoint takes longer than the interval, it's possible that multiple > checkpoints will o

Re: Secondary NameNodes or NFS exports?

2009-12-24 Thread Todd Lipcon
How long does the checkpoint take? It seems possible to me that if the 2NN checkpoint takes longer than the interval, it's possible that multiple checkpoints will overlap and might trigger this. (this is conjecture, so definitely worth testing) -Todd On Wed, Dec 23, 2009 at 6:38 PM, Jason Venner

Exception in getSendBufferSize java.net.SocketException: Protocol not available

2009-12-24 Thread Konda Ankireddyapalli
Hello all, I am new to Hadoop, and have been trying to use the latest stable release(0.20.1). When I tried the grep example in the pseudo-distributed mode the hadoop's fs -put command results in SocketException. Socket's getSendBufferSize method is throwing this error. Here is the relevant log.

what is dfs.fileSystem.TotalLoad?

2009-12-24 Thread Fu-Ming Tsai
Dear all, I've installed hadoop and use ganglia to monitor in our cluster. Sometimes I got value dfs.fileSystem.TotalLoad from ganglia is around 4000. I don't what it means. Does anyone knows? Br, Fu-Ming -- "Losers always whine about their best." Fu-Ming

Jobs stop at 0%

2009-12-24 Thread Raymond Jennings III
I have been recently seeing a problem where jobs stop at map 0% that previously worked fine (with no code changes.) Restarting hadoop on the cluster solves this problem but there is nothing in the log files to indicate what the problem is. Has anyone seen something similar?