Running Hadoop in different modes

2011-09-22 Thread Praveen Sripati
Hi, What are the features available in the Fully-Distributed Mode and the Pseudo-Distributed Mode that are not available in the Local (Standalone) Mode? Local (Stanndalone) Mode is very fast and I am able get in run in Eclipse also. Thanks, Praveen

Re: RE: Making Mumak work with capacity scheduler

2011-09-22 Thread arun k
Sorry , 1Q: In web GUI of Jobtracker i see both he queues but "CAPACITIES ARE NOT REFLECTED" 2Q:All the jobs by defaul are submitted to "default" queue. How can i submit jobs to various queues in mumak ? regards, Arun On Fri, Sep 23, 2011 at 11:57 AM, arun k wrote: > Hi guys ! > > I have run

Fwd: RE: Making Mumak work with capacity scheduler

2011-09-22 Thread arun k
Hi guys ! I have run mumak as sudo. It works fine. i am trying to run jobtrace in test/data with capacity scheduler. I have done : 1> Build contrib/capacity-scheduler 2>Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/ 3>added mapred.jobtracker.taskScheduler and mapred.que

Re: RE: Making Mumak work with capacity scheduler

2011-09-22 Thread arun k
Hi ! I have changed he permissions for hadoop extract and /jobstory and /history/done dir recursively: $chmod -R 777 branch-0.22 $chmod -R logs $chmod -R jobracker but still i get the same problem. The permissions are like this The log is here

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
I don't have much control over the cluster configuration, but I will speak with those that do. As far as the input splits are concerned, I will look into some custom CombinedInputFormat stuff. Thank you both very much! -- Adam From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.co

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
I don't think that this is hurting you in this particular instance but you are *most likely* oversubscribing your boxes with this configuration (a monitoring system like Ganglia will be able to confirm this). You will want at least 1 dedicated physical core for the DN and TT daemons (some would

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
It is enabled -- I'll try disabling it. The job runs over a large number of files that are relatively the size of an HDFS block, so less splits isn't much of an option. -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Thursday, September 22, 2011 4:28 PM To: m

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
Each box has 3 (seems weird) quad core HT'ed CPUS for a total of 12 HT'ed cores per machine. 16 map tasks and 8 reduce tasks each. -- Adam From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.com] Sent: Thursday, September 22, 2011 4:26 PM To: mapreduce-user@hadoop.apache.org Subject:

Re: FairScheduler Local Task Restriction

2011-09-22 Thread Joey Echeverria
Do you have assign multiple enabled in the fair scheduler? Event that may not be able to keep up if the tasks are only taking 10 seconds. Any way you could run the job with less splits? On Thu, Sep 22, 2011 at 1:21 PM, Adam Shook wrote: > Okay, I put a Thread.sleep to test my theory and it will r

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
Just to confirm your configuration, how many logical cores do these boxes actually have (I am assuming dual quad core HT'ed)? Do you not have any reduce slots allocated? Matt From: Adam Shook [mailto:ash...@clearedgeit.com] Sent: Thursday, September 22, 2011 3:22 PM To: mapreduce-user@hadoop.ap

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
Okay, I put a Thread.sleep to test my theory and it will run all 128 at a time - they are just completing too quickly. I guess there is no other way to get around it, unless someone knows how to make the scheduler schedule faster... -- Adam From: Adam Shook [mailto:ash...@clearedgeit.com] Sent

RE: FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
It's an 8 node cluster and all 8 task trackers are being used. Each tracker has 16 max map tasks. Each tracker seems to be running two at a time. Map tasks take 10 seconds from start to finish. Is it possible that they are just completing faster than they can be created and it just seems to

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
If you dig into the job history on the web-ui can you confirm whether it is the same 16 tasktrackers slots that are getting the map tasks? Long shot but it could be that it is actually distributing across your cluster and there is some other issue that is springing up. Also, how long does each o

FairScheduler Local Task Restriction

2011-09-22 Thread Adam Shook
Hello All, I have recently switched my small Hadoop dev cluster (v0.20.1) to use the FairScheduler. I have a max of 128 map tasks available and recently noticed that my jobs seem to use a maximum of 16 at any given time (the job I am looking at in particular runs for about 15 minutes) - they a

Re: How do I set the intermediate output path when I use 2 mapreduce jobs?

2011-09-22 Thread Robert Evans
Sorry about the confusion then. Look at the code that Swathi sent. Your problem is probably with timing some where. You may be launching the second job before the first one completely finished. --Bobby Evans On 9/21/11 7:27 PM, "谭军" wrote: Bobby Evans Temp files are on HDFS not on local fi

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Thanks, got the point. So, the shuffle and sort can happen in parallel even before all the map tasks are completed, but the reduce happens only after all the map tasks are complete. Praveen On Thu, Sep 22, 2011 at 7:13 PM, Joey Echeverria wrote: > In most cases, your job will have more map task

Re: RE: Making Mumak work with capacity scheduler

2011-09-22 Thread Uma Maheswara Rao G 72686
Yes Devaraj, >From the logs, looks it failed to create /jobtracker/jobsInfo code snippet: if (!fs.exists(path)) { if (!fs.mkdirs(path, new FsPermission(JOB_STATUS_STORE_DIR_PERMISSION))) { throw new IOException( "CompletedJobStatusStore mkdirs failed to create "

Re: Regarding FIFO scheduler

2011-09-22 Thread Joey Echeverria
In most cases, your job will have more map tasks than map slots. You want the reducers to spin up at some point before all your maps complete, so that the shuffle and sort can work in parallel with some of your map tasks. I usually set slow start to 80%, sometimes higher if I know the maps are slow

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Joey, Thanks for the response. 'mapreduce.job.reduce.slowstart.completedmaps' is default set to 0.05 and says 'Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.' Shouldn't the map tasks be completed before the reduce tasks are kicked for

Re: Regarding FIFO scheduler

2011-09-22 Thread Joey Echeverria
The jobs would run in parallel since J1 doesn't use all of your map tasks. Things get more interesting with reduce slots. If J1 is an overall slower job, and you haven't configured mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch of idle reduce tasks which would starve J2. In g

Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Hi, Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map tasks per node) and Hadoop is using the default FIFO scheduler. If I submit first J1 and then J2, will the jobs run in parallel or the job J1 has