Lin,

The article you are reading us old.
Fair scheduler does have preemption.
Tasks get killed and rerun later, potentially on a different node.

You can set a minimum / guaranteed capacity. The sum of those across pools 
would typically equal the total capacity of your cluster or less.
Then you can configure each pool to go beyond that capacity. That would happen 
if the cluster is temporary not used to the full capacity.
Then when the demand for capacity increases, and jobs are queued in other pools 
that are not running at their minimum guaranteed capacity, some long running 
tasks from jobs in the pool that is using more than its minimum capacity get 
killed (to be run later again).

Does that make sense?

Cheers,

Joep

Sent from my iPhone

On Jan 20, 2013, at 6:25 AM, Lin Ma <lin...@gmail.com> wrote:

> Hi guys,
> 
> I have a quick question regarding to fire scheduler of Hadoop, I am reading 
> this article => 
> http://blog.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/, my question 
> is from the following statements, "There is currently no support for 
> preemption of long tasks, but this is being added in HADOOP-4665, which will 
> allow you to set how long each pool will wait before preempting other jobs’ 
> tasks to reach its guaranteed capacity.".
> 
> My questions are,
> 
> 1. What means "preemption of long tasks"? Kill long running tasks, or pause 
> long running tasks to give resources to other tasks, or it means something 
> else?
> 2. I am also confused about "set how long each pool will wait before 
> preempting other jobs’ tasks to reach its guaranteed capacity"., what means 
> "reach its guaranteed capacity"? I think when using fair scheduler, each pool 
> has predefined resources allocation settings (and the settings guarantees 
> each pool has resources as configured), is that true? In what situations each 
> pool will not have its guaranteed (or configured) capacity?
> 
> regards,
> Lin

Reply via email to