Re: Questions about the MapReduce libraries and job schedulers inside JobTracker and JobClient running on Hadoop

2008-02-15 Thread Andy Li
Thanks for both inputs. My question actually focus more on what Vivek has mentioned. I would like to work on the JobClient to see how it submits jobs to different file system and slaves in the same Hadoop cluster. Not sure if there is a complete document to explain the scheduler underneath Hadoo

Re: dfsadmin reporting wrong disk usage numbers

2008-02-15 Thread Konstantin Shvachko
Yes, please file a bug. There are file systems with different block sizes out there Linux or Solaris. Thanks, --Konstantin Martin Traverso wrote: I think I found the issue. The class org.apache.hadoop.fs.DU assumes 1024-byte blocks when reporting usage information: this.used = Long.parseLon

Re: How to config if ssh server port is not default

2008-02-15 Thread Raghu Angadi
david wrote: > I mean I used IBM mapredce tools plugin Eclipse for connect to Hadoop server. > Original Eclipse show connect fail, then I found problem appear my Hadoop > server ssh port is not default 22. Eclipse connect to Hadoop sucess, if I > change backup port 22. > > So how to used Eclipse

RE: Questions about the MapReduce libraries and job schedulers inside JobTracker and JobClient running on Hadoop

2008-02-15 Thread Vivek Ratan
I read Andy's question a little differently. For a given job, the JobTracker decides which tasks go to which TaskTracker (the TTs ask for a task to run and the JT decides which task is the most appropriate). Currently, the JT favors a task whose input data is on the same host as the TT (if there ar

Re: dfsadmin reporting wrong disk usage numbers

2008-02-15 Thread Martin Traverso
I think I found the issue. The class org.apache.hadoop.fs.DU assumes 1024-byte blocks when reporting usage information: this.used = Long.parseLong(tokens[0])*1024; This works fine in linux, but in Solaris and Mac OS the reported number of blocks is based on 512-byte blocks. The solution is si

Re: dfsadmin reporting wrong disk usage numbers

2008-02-15 Thread Martin Traverso
> > What are the data directories > specified in your configuration? Have you specified two data directories > per > volume? > No, just one directory per volume. This is the value of dfs.data.dir in my hadoop-site.xml: dfs.data.dir /local/data/hadoop/d0/dfs/data,/local/data/ha

Re: Questions about the MapReduce libraries and job schedulers inside JobTracker and JobClient running on Hadoop

2008-02-15 Thread Ted Dunning
Core-user is the right place for this question. Your description is mostly correct. Jobs don't necessarily go to all of your boxes in the cluster, but they may. Non-uniform machine specs are a bit of a problem that is being (has been?) addressed by allowing each machine to have a slightly diffe

Re: dfsadmin reporting wrong disk usage numbers

2008-02-15 Thread Hairong Kuang
Datanode run du on data directories hourly. In between two "du"s, used space is updated when a block is added or deleted. What are the data directories specified in your configuration? Have you specified two data directories per volume? Hairong On 2/15/08 1:05 PM, "Martin Traverso" <[EMAIL PROTEC

Questions about the MapReduce libraries and job schedulers inside JobTracker and JobClient running on Hadoop

2008-02-15 Thread Andrew_Lee
Hello, My first time posting this in the news group.My question sounds more like a MapReduce question instead of Hadoop HDFS itself. To my understanding, the JobClient will submit all Mapper and Reduce class in a uniform way to the cluster? Can I assume this is more like a uniform sched

dfsadmin reporting wrong disk usage numbers

2008-02-15 Thread Martin Traverso
Hi, Are there any known issues on how dfsadmin reports disk usage? I'm getting some weird values: Name: 10.15.104.46:50010 State : In Service Total raw bytes: 1433244008448 (1.3 TB) Remaining raw bytes: 383128089432(356.82 GB) Used raw bytes: 1042296986024 (970.71 GB) % used: 72.72% Ho

Re: Can reduce output in two different output formats?

2008-02-15 Thread Arun C Murthy
On Feb 14, 2008, at 2:09 PM, Jason Venner wrote: We write a separate file in many our our mappers and or reducers. We are somewhat concerned about speculative execution and what happens to the output files of killed jobs, but it seems to work fine. We build the output files by passing in a *

Re: Using jmx fails because of multiple port listeners

2008-02-15 Thread Allen Wittenauer
On 2/15/08 9:19 AM, "Nathan Wang" <[EMAIL PROTECTED]> wrote: > Right, you can't add that line globally. That will affect all processes. > > What you can do is to modify this file: HADOOP_HOME/bin/hadoop. > For each process, give a different port number. See also https://issues.apache.org

RE: Using jmx fails because of multiple port listeners

2008-02-15 Thread Nathan Wang
Right, you can't add that line globally. That will affect all processes. What you can do is to modify this file: HADOOP_HOME/bin/hadoop. For each process, give a different port number. For example, for tasktracker, assign port 12345: ... elif [ "$COMMAND" = "tasktracker" ] ; then CLASS=org.ap

Using jmx fails because of multiple port listeners

2008-02-15 Thread Ferdy Galema
If I use the following parameters in mapred.child.java.opts, then the Reduce tasks will inmediately fail with exit code 1. -Dcom.sun.management.jmxremote.port=7575 - Dcom.sun.management.jmxremote.authenticate=false - Dcom.sun.management.jmxremote.ssl=false The problem is the fact that there are 2

Using jmx fails because of multiple port listeners

2008-02-15 Thread Ferdy Galema
If I use the following parameters in mapred.child.java.opts, then the Reduce tasks will inmediately fail with exit code 1. -Dcom.sun.management.jmxremote.port=7575 - Dcom.sun.management.jmxremote.authenticate=false - Dcom.sun.management.jmxremote.ssl=false The problem is the fact that there are 2

0.15.3 dfshealth failures

2008-02-15 Thread Jason Venner
We have several clusters, and on two of them the dfshealth.jsp does not run, to the best of our knowledge the clusters are identical except for the slaves, and the dfs and tasktracker. I don't seem to find anything in the log files for the webapps. The jobtracker.jsp runs without problem. What

RE: specifying Hadoop disk space

2008-02-15 Thread Chandran, Sathish
Many thanks jdcryans :) Regards|Sathish -Original Message- From: Jean-Daniel Cryans [mailto:[EMAIL PROTECTED] Sent: Friday, February 15, 2008 7:13 PM To: core-user@hadoop.apache.org Subject: Re: specifying Hadoop disk space Hi, Have you read : http://wiki.apache.org/hadoop/QuickStart

Re: specifying Hadoop disk space

2008-02-15 Thread Jean-Daniel Cryans
Hi, Have you read : http://wiki.apache.org/hadoop/QuickStart Stage 3, second dot? Regards, jdcryans 2008/2/15, Chandran, Sathish <[EMAIL PROTECTED]>: > > > > Hi all, > > > > Can you help me out the following? > > > > Normally Hadoop takes the free disk spaces available from the machine. > But I