Hi,
Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map
tasks per node) and Hadoop is using the default FIFO scheduler. If I submit
first J1 and then J2, will the jobs run in parallel or the job J1 has
In most cases, your job will have more map tasks than map slots. You
want the reducers to spin up at some point before all your maps
complete, so that the shuffle and sort can work in parallel with some
of your map tasks. I usually set slow start to 80%, sometimes higher
if I know the maps are
Yes Devaraj,
From the logs, looks it failed to create /jobtracker/jobsInfo
code snippet:
if (!fs.exists(path)) {
if (!fs.mkdirs(path, new
FsPermission(JOB_STATUS_STORE_DIR_PERMISSION))) {
throw new IOException(
CompletedJobStatusStore mkdirs failed to create
Thanks, got the point. So, the shuffle and sort can happen in parallel even
before all the map tasks are completed, but the reduce happens only after
all the map tasks are complete.
Praveen
On Thu, Sep 22, 2011 at 7:13 PM, Joey Echeverria j...@cloudera.com wrote:
In most cases, your job will
Sorry about the confusion then. Look at the code that Swathi sent. Your
problem is probably with timing some where. You may be launching the second
job before the first one completely finished.
--Bobby Evans
On 9/21/11 7:27 PM, 谭军 tanjun_2...@163.com wrote:
Bobby Evans
Temp files are on
Hello All,
I have recently switched my small Hadoop dev cluster (v0.20.1) to use the
FairScheduler. I have a max of 128 map tasks available and recently noticed
that my jobs seem to use a maximum of 16 at any given time (the job I am
looking at in particular runs for about 15 minutes) - they
If you dig into the job history on the web-ui can you confirm whether it is the
same 16 tasktrackers slots that are getting the map tasks? Long shot but it
could be that it is actually distributing across your cluster and there is some
other issue that is springing up. Also, how long does each
Okay, I put a Thread.sleep to test my theory and it will run all 128 at a time
- they are just completing too quickly. I guess there is no other way to get
around it, unless someone knows how to make the scheduler schedule faster...
-- Adam
From: Adam Shook [mailto:ash...@clearedgeit.com]
Just to confirm your configuration, how many logical cores do these boxes
actually have (I am assuming dual quad core HT'ed)? Do you not have any reduce
slots allocated?
Matt
From: Adam Shook [mailto:ash...@clearedgeit.com]
Sent: Thursday, September 22, 2011 3:22 PM
To:
Each box has 3 (seems weird) quad core HT'ed CPUS for a total of 12 HT'ed cores
per machine. 16 map tasks and 8 reduce tasks each.
-- Adam
From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.com]
Sent: Thursday, September 22, 2011 4:26 PM
To: mapreduce-user@hadoop.apache.org
Subject:
It is enabled -- I'll try disabling it. The job runs over a large number of
files that are relatively the size of an HDFS block, so less splits isn't much
of an option.
-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com]
Sent: Thursday, September 22, 2011 4:28 PM
To:
I don't have much control over the cluster configuration, but I will speak with
those that do. As far as the input splits are concerned, I will look into some
custom CombinedInputFormat stuff.
Thank you both very much!
-- Adam
From: GOEKE, MATTHEW (AG/1000)
Hi !
I have changed he permissions for hadoop extract and /jobstory and
/history/done dir recursively:
$chmod -R 777 branch-0.22
$chmod -R logs
$chmod -R jobracker
but still i get the same problem.
The permissions are like this http://pastebin.com/sw3UPM8t
The log is here
13 matches
Mail list logo