SecondaryNameNode cannot create checkpoint

2009-04-04 Thread Edwin Chu
Hi I am using Hadoop 0.19.1. I want to configure a cluster of 7 nodes and 2 of them act as secondary name node. I put the hostname of 2 secondary name node in conf/masters file and start the cluster using the start-all.sh script. The cluster (HDFS and MapReduce) is working properly. However, both

Re: Newbie questions on Hadoop topology

2009-04-04 Thread Todd Lipcon
On Sat, Apr 4, 2009 at 10:25 PM, Foss User wrote: > > On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote: > > On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote: > >> > >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I > >> have to edit this in job tracker too? > >> > > > > T

Re: NullPointerException while starting start-dfs.sh

2009-04-04 Thread Foss User
On Sat, Apr 4, 2009 at 6:46 PM, Foss User wrote: > Whenever I try to start the DFS, I get this error: > > had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh > starting namenode, logging to > /home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out > 10.31.253.142: starting

Re: Newbie questions on Hadoop topology

2009-04-04 Thread Foss User
I have a few more questions on your answers. Please see them inline. On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote: > On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote: >> >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I >> have to edit this in job tracker too? >> > > T

Re: Newbie questions on Hadoop topology

2009-04-04 Thread Todd Lipcon
On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote: > Certain things are not clear. I am asking them point-wise. I have a > setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves > (each is data node as well as task tracker). > For a cluster of this size, you probably want to run one ma

Re: joining two large files in hadoop

2009-04-04 Thread Todd Lipcon
On Sat, Apr 4, 2009 at 2:11 PM, Christian Ulrik Søttrup wrote: > Hello all, > > I need to do some calculations that has to merge two sets of very large > data (basically calculate variance). > One set contains a set of "means" and the second a set of objects tied to > a mean. > > Normally I woul

Re: joining two large files in hadoop

2009-04-04 Thread jason hadoop
This is discussed in chapter 8 of my book. In short, If both data sets are: - in same key order - partitioned with the same partitioner, - the input format of each data set is the same, (necessary for this simple example only) A map side join will present all the key value pairs of e

Re: joining two large files in hadoop

2009-04-04 Thread Ken Krugler
I need to do some calculations that has to merge two sets of very large data (basically calculate variance). One set contains a set of "means" and the second a set of objects tied to a mean. Normally I would send the set of means using the distributed cache, but the set has become too large

joining two large files in hadoop

2009-04-04 Thread Christian Ulrik Søttrup
Hello all, I need to do some calculations that has to merge two sets of very large data (basically calculate variance). One set contains a set of "means" and the second a set of objects tied to a mean. Normally I would send the set of means using the distributed cache, but the set has bec

job status from command prompt

2009-04-04 Thread Elia Mazzawi
is there a command that i can run from the shell that says this job passed / failed I found these but they don't really say pass/fail they only say what is running and percent complete. this shows what is running ./hadoop job -list and this shows the completion ./hadoop job -status job_20090

Why namenode logs into itself as well as job tracker?

2009-04-04 Thread Foss User
I have a namenode and job tracker on two different machines. I see that a namenode tries to do an ssh log into itself (name node), job tracker as well as all slave machines. However, the job tracker tries to do an ssh log into the slave machines only. Why this difference in behavior? Could someon

NullPointerException while starting start-dfs.sh

2009-04-04 Thread Foss User
Whenever I try to start the DFS, I get this error: had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh starting namenode, logging to /home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out 10.31.253.142: starting datanode, logging to /home/hadoop/hadoop-0.19.1/bin/../logs/h

Newbie questions on Hadoop topology

2009-04-04 Thread Foss User
I was going through the tutorial here. http://hadoop.apache.org/core/docs/current/cluster_setup.html Certain things are not clear. I am asking them point-wise. I have a setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves (each is data node as well as task tracker). 1. Should I edi