Hi
I am using Hadoop 0.19.1. I want to configure a cluster of 7 nodes and 2 of
them act as secondary name node. I put the hostname of 2 secondary name node
in conf/masters file and start the cluster using the start-all.sh script.
The cluster (HDFS and MapReduce) is working properly. However, both
On Sat, Apr 4, 2009 at 10:25 PM, Foss User wrote:
>
> On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote:
> > On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote:
> >>
> >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I
> >> have to edit this in job tracker too?
> >>
> >
> > T
On Sat, Apr 4, 2009 at 6:46 PM, Foss User wrote:
> Whenever I try to start the DFS, I get this error:
>
> had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh
> starting namenode, logging to
> /home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out
> 10.31.253.142: starting
I have a few more questions on your answers. Please see them inline.
On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote:
> On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote:
>>
>> 1. Should I edit conf/slaves on all nodes or only on name node? Do I
>> have to edit this in job tracker too?
>>
>
> T
On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote:
> Certain things are not clear. I am asking them point-wise. I have a
> setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves
> (each is data node as well as task tracker).
>
For a cluster of this size, you probably want to run one ma
On Sat, Apr 4, 2009 at 2:11 PM, Christian Ulrik Søttrup wrote:
> Hello all,
>
> I need to do some calculations that has to merge two sets of very large
> data (basically calculate variance).
> One set contains a set of "means" and the second a set of objects tied to
> a mean.
>
> Normally I woul
This is discussed in chapter 8 of my book.
In short,
If both data sets are:
- in same key order
- partitioned with the same partitioner,
- the input format of each data set is the same, (necessary for this
simple example only)
A map side join will present all the key value pairs of e
I need to do some calculations that has to merge two sets of very
large data (basically calculate variance).
One set contains a set of "means" and the second a set of objects
tied to a mean.
Normally I would send the set of means using the distributed cache,
but the set has become too large
Hello all,
I need to do some calculations that has to merge two sets of very large
data (basically calculate variance).
One set contains a set of "means" and the second a set of objects tied
to a mean.
Normally I would send the set of means using the distributed cache, but
the set has bec
is there a command that i can run from the shell that says this job
passed / failed
I found these but they don't really say pass/fail they only say what is
running and percent complete.
this shows what is running
./hadoop job -list
and this shows the completion
./hadoop job -status job_20090
I have a namenode and job tracker on two different machines.
I see that a namenode tries to do an ssh log into itself (name node),
job tracker as well as all slave machines.
However, the job tracker tries to do an ssh log into the slave
machines only. Why this difference in behavior? Could someon
Whenever I try to start the DFS, I get this error:
had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh
starting namenode, logging to
/home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out
10.31.253.142: starting datanode, logging to
/home/hadoop/hadoop-0.19.1/bin/../logs/h
I was going through the tutorial here.
http://hadoop.apache.org/core/docs/current/cluster_setup.html
Certain things are not clear. I am asking them point-wise. I have a
setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves
(each is data node as well as task tracker).
1. Should I edi
13 matches
Mail list logo