Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Hi, Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map tasks per node) and Hadoop is using the default FIFO scheduler. If I submit first J1 and then J2, will the jobs run in parallel or the job J1 has

Re: Regarding FIFO scheduler

2011-09-22 Thread Joey Echeverria
In most cases, your job will have more map tasks than map slots. You want the reducers to spin up at some point before all your maps complete, so that the shuffle and sort can work in parallel with some of your map tasks. I usually set slow start to 80%, sometimes higher if I know the maps are

Re: RE: Making Mumak work with capacity scheduler

2011-09-22 Thread Uma Maheswara Rao G 72686
Yes Devaraj, From the logs, looks it failed to create /jobtracker/jobsInfo code snippet: if (!fs.exists(path)) { if (!fs.mkdirs(path, new FsPermission(JOB_STATUS_STORE_DIR_PERMISSION))) { throw new IOException( CompletedJobStatusStore mkdirs failed to create

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Thanks, got the point. So, the shuffle and sort can happen in parallel even before all the map tasks are completed, but the reduce happens only after all the map tasks are complete. Praveen On Thu, Sep 22, 2011 at 7:13 PM, Joey Echeverria j...@cloudera.com wrote: In most cases, your job will

Re: How do I set the intermediate output path when I use 2 mapreduce jobs?

2011-09-22 Thread Robert Evans
Sorry about the confusion then. Look at the code that Swathi sent. Your problem is probably with timing some where. You may be launching the second job before the first one completely finished. --Bobby Evans On 9/21/11 7:27 PM, 谭军 tanjun_2...@163.com wrote: Bobby Evans Temp files are on

FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
Hello All, I have recently switched my small Hadoop dev cluster (v0.20.1) to use the FairScheduler. I have a max of 128 map tasks available and recently noticed that my jobs seem to use a maximum of 16 at any given time (the job I am looking at in particular runs for about 15 minutes) - they

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
If you dig into the job history on the web-ui can you confirm whether it is the same 16 tasktrackers slots that are getting the map tasks? Long shot but it could be that it is actually distributing across your cluster and there is some other issue that is springing up. Also, how long does each

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
Okay, I put a Thread.sleep to test my theory and it will run all 128 at a time - they are just completing too quickly. I guess there is no other way to get around it, unless someone knows how to make the scheduler schedule faster... -- Adam From: Adam Shook [mailto:ash...@clearedgeit.com]

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
Just to confirm your configuration, how many logical cores do these boxes actually have (I am assuming dual quad core HT'ed)? Do you not have any reduce slots allocated? Matt From: Adam Shook [mailto:ash...@clearedgeit.com] Sent: Thursday, September 22, 2011 3:22 PM To:

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
Each box has 3 (seems weird) quad core HT'ed CPUS for a total of 12 HT'ed cores per machine. 16 map tasks and 8 reduce tasks each. -- Adam From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.com] Sent: Thursday, September 22, 2011 4:26 PM To: mapreduce-user@hadoop.apache.org Subject:

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
It is enabled -- I'll try disabling it. The job runs over a large number of files that are relatively the size of an HDFS block, so less splits isn't much of an option. -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Thursday, September 22, 2011 4:28 PM To:

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
I don't have much control over the cluster configuration, but I will speak with those that do. As far as the input splits are concerned, I will look into some custom CombinedInputFormat stuff. Thank you both very much! -- Adam From: GOEKE, MATTHEW (AG/1000)

Re: RE: Making Mumak work with capacity scheduler

2011-09-22 Thread arun k
Hi ! I have changed he permissions for hadoop extract and /jobstory and /history/done dir recursively: $chmod -R 777 branch-0.22 $chmod -R logs $chmod -R jobracker but still i get the same problem. The permissions are like this http://pastebin.com/sw3UPM8t The log is here