[ 
https://issues.apache.org/jira/browse/YARN-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706181#comment-15706181
 ] 

Konstantinos Karanasos commented on YARN-5886:
----------------------------------------------

Thank you for the feedback, [~cxcw].

bq. And also Microsoft also published a paper in ATC to talk about this 
feature. Here is some of my concerns.
Which paper are you referring too? We had a paper in EuroSys 2016 ("Efficient 
Queue Management for Cluster Scheduling"), in which we are investigating 
different queue reordering strategies, along with other techniques for 
efficient queue management (queue sizing, placement, etc.). We called the 
system Yaq (we had both a centralized and a distributed scheduling version). Is 
that the paper you meant?
Many of the techniques we are planning to add here will originate from Yaq.

bq. 1. How the local NM CotnaienrScheduler coordinate with global scheduler. 
since global scheduler will try to keep fair and grarantee share across the 
applications(queue).
So the way we are planning to do this is by letting the global scheduler send 
the tasks to the nodes. Then the reordering happens only locally at each node 
(only for the tasks that are queued at the moment). Note that reordering is 
done only for opportunistic containers (guaranteed are not allowed to be 
queued). This way we are not affecting the fairness guarantees of guaranteed 
containers.
If we want to do fairness across opportunistic containers, we will need some 
additional techniques (we did this through a timeout in the EuroSys paper).
Does this make sense or you had something else in mind?

bq. 2. Nodemanger may not know(or estimate) the runtime for queued container. 
Falsely estimation(mistake a long-running as a short-running) may cause serious 
results.(inverse priority?)
That is a good point. In the initial strategies, we are planning to not take 
into account the task duration (because it might not always be available or 
might be imprecise like you say). One way is to take into account the progress 
of the job, in terms of tasks completed. Later, if we introduce task durations, 
we can have even better strategies. But we will have to make sure we are robust 
in case of mis-estimations.

> Dynamically prioritize execution of opportunistic containers (NM queue 
> reordering)
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-5886
>                 URL: https://issues.apache.org/jira/browse/YARN-5886
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Konstantinos Karanasos
>            Assignee: Konstantinos Karanasos
>
> Currently the {{ContainerScheduler}} in the NM picks the next queued 
> opportunistic container to be executed in a FIFO manner. That is, we first 
> execute containers that arrived first at the NM.
> This JIRA proposes to add pluggable queue reordering strategies at the NM 
> that will dynamically determine which opportunistic container will be 
> executed next.
> For example, we can choose to prioritize containers that belong to jobs which 
> are closer to completion, or containers that are short-running (if such 
> information is available).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to