Hi Sreekanth, When you mention about setting the max task limit, do you mean by executing
set mapred.capacity-scheduler.queue.<queue-name>.maximum-capacity = <a percentage> ? Is it only available on hadoop 0.21? Thanks, Rosanna On 5/1/11 8:42 PM, "Sreekanth Ramakrishnan" <sreer...@yahoo-inc.com> wrote: > > The design goal of CapacityScheduler is maximizing the utilization of cluster > resources but it does not fairly allocate the share amongst the total number > of users present in the system. > > The user limit states the number of concurrent users who can use the slots in > the queue. But then these limits are elastic in nature, as there is no > preemption as the slots get freed up the new tasks will be allotted those slot > to meet the user limit. > > In order for your requirement, you can possibly submit the large tasks to a > queue which have max task limit set, so your long running jobs don¹t take up > whole of the cluster capacity and submit shorter, smaller jobs to fast moving > queue with something like 10% user limit which allows 10 concurrent user per > queue. > > The actual distribution of the of the capacity across longer/shorter jobs > depends on your workload. > > > On 4/30/11 1:14 AM, "Rosanna Man" <rosa...@auditude.com> wrote: > >> Hi Sreekanth, >> >> Thank you very much for your clarification. Setting the max task limits on >> queues will work but can we do something on the max user limit? Is it >> pre-emptible also? We are exploring about the possibility of running the >> queries with different users for capacity scheduler to maximize the use of >> the resources. >> >> Basically, our goal is to maximize the resources (mappers and reducers) while >> providing a fair share to the short tasks while a big task is running. How do >> you normally achieve hat? >> >> Thanks, >> Rosanna >> >> On 4/28/11 8:09 PM, "Sreekanth Ramakrishnan" <sreer...@yahoo-inc.com> wrote: >> >>> Hi >>> >>> Currently CapacityScheduler does not have pre-emption. So basically when the >>> Job1 starts finishing and freeing up the Job2¹s tasks will start getting >>> scheduled. One way you can prevent that queue capacities are not elastic in >>> nature is by setting max task limits on queues. That way your job1 will >>> never execeed first queues capacity >>> >>> >>> >>> >>> On 4/28/11 11:48 PM, "Rosanna Man" <rosa...@auditude.com> wrote: >>> >>>> Hi all, >>>> >>>> We are using capacity scheduler to schedule resources among different >>>> queues for 1 user (hadoop) only. We have set the queues to have equal share >>>> of the resources. However, when 1st task starts in the first queue and is >>>> consuming all the resources, the 2nd task starts in the 2nd queue will be >>>> starved from reducer until the first task finished. A lot of processing is >>>> being stuck when a large query is executing. >>>> >>>> We are using 0.20.2 hive in amazon aws. We tried to use Fair Scheduler >>>> before but it gives an error when the mapper gives no output (which is fine >>>> in our use cases). >>>> >>>> Anyone can give us some advice? >>>> >>>> Thanks, >>>> Rosanna >>