Hi all,
I'm running Hadoop 0.20.2 in a cluster setup built by 4 machines, and
I'm trying to run my own job. Here is my configuration:
core-site.xml
fs.default.namehdfs://net-server00:44310
hadoop.tmp.dir/tmp/hadoop-leonardi
hdfs-site.xml
dfs.replication1
mapred-site.xml
mapred.job.tr
Are you doing much in your reducer? Try reporting a string back at
various progress points of your reducer functionality (like a status
update).
A task would time-out if it has not {written anything / read anything
/ reported anything about its status}.
On Sat, Dec 18, 2010 at 7:24 PM, Ivan Leona
Hi Ivan,
There are two possibilities - First is that you are doing some very
heavy calculation in the reduce phase which is taking too long. In this
case, you can try increasing the timeout with these config params:
mapreduce.task.timeout
mapreduce.tasktracker.healthchecker.script.timeout
Hi Ivan
i faced the same thing , try adding in the /etc/hosts the host names
with their IP address like
master
masterhostname
slave
slavehostname
On Sat, Dec 18, 2010 at 7:54 AM, Ivan Leonardi wrote:
> Hi all,
> I'm running Hadoop 0.20.2 in a cluster setup built by 4 machines, and
> I'm
Yes, I'm performing computation on large graphs (3.7M of nodes) and
set the numReduceTasks to 60. In attachment there is the code. In line
23 I'm using a grouping function that computes the group ID of a given
node ID: the number of groups actually corresponds to the
numReduceTasks parameter. Am I
Yes, I'm performing computation on large graphs (3.7M of nodes) and
set the numReduceTasks to 60. Below there is the code. I'm using a
grouping function grouper.getGroup(u) that computes the group ID of
the node having u as ID: the number of groups actually corresponds to
the numReduceTasks paramet
Try adding log or printing to sysout to find exactly where the problem
occurs? Or as Harsh said, if you keep reporting status at regular intervals,
the job won't get killed. Then you can see if the job ever goes to
completion.
On Sat, Dec 18, 2010 at 8:12 PM, Ivan Leonardi wrote:
> Yes, I'm perfo
Furthermore, the 'syslog' file of a failing reduce task attempt
doesn't show any warning about it, here is the tail of it:
2010-12-18 15:49:29,773 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 60 segments left of total size: 35288842
bytes
2010-12-18 15:49:31,012 INFO org.
Hello everbody,
I am wondering if there is a feature allowing (in my case) reduce
tasks to communicate. For example by some volatile variables at some
centralized point. Or maybe just notify other running or to-be-running
reduce tasks of a completed reduce task featuring some arguments.
In my cas
In your reducer, you can utilize Reporter (getCounter and incrCounter
methods) to pass this information between reducers.
On Sat, Dec 18, 2010 at 8:04 AM, Martin Becker <_martinbec...@web.de> wrote:
> Hello everbody,
>
> I am wondering if there is a feature allowing (in my case) reduce
> tasks to
Hi All,
Is there anyway to influence where a reduce task is run? We have a case where
we'd like to choose the host to run the reduce task based on the task's input
key.
Any suggestion is greatly appreciated.
Thanks,
Jane
Hi Jane,
The partitioner class can be used to achieve this. (
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html
).
Thanks,
Hari
On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen wrote:
> Hi All,
>
> Is there anyway to influence where a reduce ta
But how does this help me request which host to schedule the reduce task to?
Thanks,
Jane
--- On Sat, 12/18/10, Hari Sreekumar wrote:
From: Hari Sreekumar
Subject: Re: How to Influence Reduce Task Location.
To: mapreduce-user@hadoop.apache.org
Date: Saturday, December 18, 2010, 10:16 AM
Hi
Thank you Ted,
I am using the 21.0 API so I would be drawing Counters from the
Context. So if a Counter is increased on a certain Reducer other
Reducers would retrieve that increased value when accessing the same
Counter? If so, then that is an interesting piece of information.
Unfortunately my t
You can specify that a group of keys should go to the same host for
reducing, but I have never encountered any situation where you need to know
beforehand exactly which host a particular key should go to. I am not sure
if that can be done. Just out of curiosity, why do you need this kind of
control
> Reducers would retrieve that increased value when accessing the same
> Counter?
I do not think counters reflect real time value. Even if they get updated the
values will lag.
If you require uptodate value I am afraid you will have to run a single reducer.
Sent from my iPhone 4
On Dec 18, 201
Hello Jason,
real time values are not required. Some lagging is tolerable. The
value/threshold communication is only needed to keep other reducers
from doing unnecessary work. Some upper bound would be nice to know,
though. A single reducer is not an option for my algorithm. That would
defeat the
> real time values are not required. Some lagging is tolerable. The
> value/threshold communication is only needed to keep other reducers
> from doing unnecessary work
Then I think counters is what you really need here (assuming they get really
updated, but I never tried that)
Sent from my iPho
Well, I will have to try, if there is no other solution. I _need_
something else, as mentioned. But if it is not supported I will have
to stick with what is there.
Yet, I also see no reason for Hadoop MapReduce, not to support some
kind of message passing. If I miss some point here, I would like t
19 matches
Mail list logo