Re: DRF details in hadoop 2.0.x

Bobby Evans Mon, 14 Oct 2013 08:14:10 -0700

Another thing to point out is that the plan is to first ask the AM to give
resources back before just taking them.  I am not sure if that has been
implemented on any of the schedulers yet.  But this would at least let the
AM try to make an intelligent decision about what to preempt and how to
preempt it. That means it could checkpoint state in the task and not lose
any progress, or it could guess that the task is so close to finishing
based on the rate of progress so far that it will gamble and let it try to
finish on it's own.  It also means that it could try to give back other
resources that would be more optimal for it.


--Bobby 

On 10/13/13 3:51 PM, "Sandy Ryza" <[email protected]> wrote:

>Hi Hilfi,
>
>This is handled differently by different schedulers.  For the Fair
>Scheduler, preemption decisions are currently only based on memory.  The
>way this works is that every few seconds we check to see whether any
>queues
>have unsatisfied demand under their share of memory.  If so, we look for
>containers in queues that are over their share of memory and kill them.
>We
>don't take into account how close a job is to finishing when deciding
>which
>containers to kill.  The plan for extending this to multiple resources is
>to trigger preemption if either CPU or memory has unsatisfied demand under
>its share.  For deciding which containers to kill, we'll traverse the
>queue/app hierarchy in the reverse order that we do when assigning
>containers. This means that we'll use DRF to decide which application to
>kill containers from.
>
>The Capacity Scheduler currently does not incorporate multiple resources
>into preemption decisions either.  In it, IIUC, DRF is only used within
>queues, and preemption happens between queues, so I'm not sure what its
>plans are for incorporating multiple resources in preemption.
>
>hope that helps
>-Sandy
>
>
>On Thu, Oct 10, 2013 at 11:58 PM, hilfi alkaff
><[email protected]>wrote:
>
>> Hi,
>>
>> I'm curious about the interaction between the DRF algorithm and
>>preemption
>> in Hadoop. Let's say that a job that enters Hadoop by itself, so it
>>could
>> get all of the CPU in the whole cluster. Then, another job comes in and
>> would like to request some CPU resources.
>>
>> DRF then kicks in and the first job needs to be potentially deallocated
>> from the machines it is running on correct? However, if the first job is
>> close to finishing or that it is operating in a huge dataset, then it
>>might
>> actually be beneficial to wait for the first job to finish instead of
>> kicking it out.
>>
>> How does the DRF algorithm implemented in hadoop 2.0.x handle the above
>> situation? Hopefully my explanation is clear.
>>
>> Thanks in advance,
>>
>> --
>> ~Hilfi Alkaff~
>>

Re: DRF details in hadoop 2.0.x

Reply via email to