> We would like to configure the equivalent of Fair Scheduler > userMaxJobsDefault = 1 (i.e. we would like to limit a user to a single job > in the cluster). > > > > · By default the Capacity Scheduler allows multiple jobs from a > single user to run concurrently. > > > > · From > http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/ > there > appear to be limits for “the number of accepted/active jobs per user”. > However, the example capacity-scheduler.xml only has limits for active > tasks e.g. <queue>.maximum-initialized-active-tasks-per-user property. > > > > · Also the source CapacitySchedulerConf.java includes the > following code which suggests that the maximum jobs per user can be > configured via the init-accept-jobs-factor property. However, this is not > clear from the description of this property. > > > > * public int getInitToAcceptJobsFactor(String queue) {* > > * int initToAccepFactor =* > > * rmConf.getInt(toFullPropertyName(queue, "init-accept-jobs-factor"), > * > > * defaultInitToAcceptJobsFactor);* > > * if(initToAccepFactor <= 0) {* > > * throw new IllegalArgumentException(* > > * "Invalid maximum jobs per user configuration " + > initToAccepFactor);* > > * }* > > * return initToAccepFactor;* > > * }* > >
Single job in the whole cluster for the user or a shared queue? For a specific user or for every user? Handling specific users should be via special queues. If you want it for every user in every shared queue, you can use init-accept-jobs-factor: int maxSystemJobs = conf.getMaxSystemJobs(); float capacityPercent = conf.getCapacity(queueName); int maxJobsPerUserToInit = (int)Math.ceil(maxSystemJobs * capacityPercent/100.0 * ulMin/100.0); int jobInitToAcceptFactor = conf.getInitToAcceptJobsFactor(queueName); int maxJobsPerUserToAccept = maxJobsPerUserToInit * jobInitToAcceptFactor; So, if max-system jobs is 1000, and the queue capacity is 10%, assuming ulimit 100% (one user at a time), maxJobsPerUserToInit will be 100 and so you will need to set jobInit factor for that queue to be 100. If the user limit is say 20% (max of 5 users at a time), then maxJobsPerUserToInit will be 20 and so you will need to set jobInit factor to be 20. It's a bit complicated but achievable. Also see http://hadoop.apache.org/docs/stable/capacity_scheduler.html for reference. > · Also, other posts and sample xml files on the web refer to > mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user > property. However, I’ve tried setting this to 1 but it has no impact. > > This configuration is used to control 'initialized' jobs per user. Initialized jobs occupy more memory in JobTracker than non-initialized ones, so this property is used to control the max number of inited jobs. Can you state your original requirement? You don't want any user to run more than one job at a time? Why? Also, I’m curious… a benefit of the Capacity Scheduler is that resource > limits can be specified in percentage terms, so if the cluster size changed > the CS configuration would not have to change. Therefore, why are some > properties specified in terms of tasks e.g. > mapred.capacity-scheduler.queue.<queue>.maximum-initialized-active-tasks-per-user > which would need to be reconfigured if the cluster size changed? Some of these limits help control static resources like JobTracker memory and so are not a function of cluster capacity. HTH -- +Vinod Hortonworks Inc. http://hortonworks.com/