[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-----------------------------------------
    Attachment: YARN-2883-trunk.011.patch

[~kasha], thanks for the review.

I am attaching a new patch.

In order to keep my reply shorter I am including below only your comments that 
required further discussion. 
I agree with and addressed all the rest.

bq. We seem to be overriding both {{performContainerStart}} and 
{{startContainerInternal}}

True, I got rid of {{performContainerStart}}, so now there is only 
{{startContainerInternal}}, which is being overridden.

bq. While the abbreviated variable names do make space, they make reading the 
code slightly harder. Can we reconsider using full names? I am comfortable with 
using alloc for allocated. Also, there seems to be a mismatch in variable names 
ending in containers and requests. How about using guaranteedAllocations, 
opportunisticAllocations, queuedGuaranteedAllocations and 
queuedOpporunisticAllocations?

I see your point about the mismatch, but I would not call them "allocations", 
because queued containers are not really allocations, and also I'd like to 
stretch that the running ones are allocated (not just allocations). I used the 
following names to correct the mismatch and to avoid abbreviations: 
allocated(Guaranteed|Opportunistic)Containers and 
queued(Guaranteed|Opportunistic)Containers.

bq. I still think we should never queue guaranteed containers. Is there a good 
reason for queuing them? Since we are killing opportunistic containers anyway, 
why not do that synchronously and start the guaranteed containers? The killing 
opportunistic containers should likely take a kill context, so we can handle it 
appropriately in YARN-1011 where we kill when the utilization goes over a 
threshold. Thinking more about this, I wonder if we should just defer to the 
kill when over a threshold utilization anyway once an opportunistic container 
has started.

There are multiple reasons for which I believe we should queue guaranteed 
containers:
# If we do the killing synchronously, the only thing we would gain is to get 
rid of the overhead of events, which I believe is negligible. On the other 
hand, synchronous killing will mean that if there is a problem with the killing 
of one of the running opportunistic containers, the following arriving 
guaranteed will also be blocked from being executed. This is not the case when 
we do the killing via events.
# Supporting the queuing of guaranteed containers opens up further interesting 
opportunities:
## Assume you have an opportunistic container that has been running for a while 
and you decide to give it a couple more seconds to finish its execution before 
killing it to make room for a guaranteed container. Having a queue of 
guaranteed containers makes such policies easy to implement.
## One may decide to queue guaranteed containers on purpose. In our EuroSys 
paper this year ("Efficient queue management for cluster scheduling") we showed 
that there can be significant gains if you do such queuing carefully.
Overall, I believe it is a pity to discard such opportunities, especially given 
the code is already there.

bq. {{killOpportContainers}} seems to not kill opportunistic containers if they 
started running. Doesn’t this mess with guarantees on the guaranteed 
containers? Am I missing something? May be, we should just iterate in reverse 
through the queuedOpportunisticContainers and spawned opportunistic containers 
and kill as many as required?

I double-checked: {{killOpportContainers}} actually kills opportunistic 
containers only if they are running. Probably the method name 
{{stopContainerInternalIfNotQueued}} was confusing, so I renamed it to 
{{stopContainerInternalIfRunning}}.

bq. I wonder if we are better off using a separate thread for starting pending 
containers periodically, and would wait between attempts. The code here could 
just notify that thread? That would likely work better when YARN-1011 wants to 
use a queue length > 1.

Our initial internal implementation was actually following such a design. The 
problem that we observed is that such a thread adds an overhead to the start of 
containers (and also does not play well with the current design for starting 
containers that is not thread-based). So I essentially rewrote this part of the 
code to avoid having such a thread. Note that for YARN-1011 we can have such a 
thread (very similar to the one used currently by the 
{{ContainersMonitorImpl}}), which can call the existing 
{{startPendingContainers()}} without any significant changes required.

bq. When queuing containers, depending on whether oversubscription is enabled 
or not, shouldn't we be checking utilization vs allocation respectively?

Agreed. I renamed the {{hasAvailableAllocatedResources()}} to 
{{hasAvailableResources()}} to reflect this. Then we can keep both an 
{{allocatedContainersUtilization}} and a {{usedContainersUtilization}}, and use 
the appropriate one depending on whether over-subscription is enabled.

bq. {{resourcesToFreeUp}} takes {{opportContainersToKill}}, but I don't see any 
container ever being added here. Am I missing something or do we not need it 
and all related fields and method parameters can be removed?

Indeed {{resourceToFreeUp}} does not add to {{opportContainersToKill}} 
directly. It calculates the resources that need to be freed up for the given 
container to start. It is the {{opportContainersToKill}} that decides which 
containers to kill. However, {{resourcesToFreeUp}} needs 
{{opportContainersToKill}} in order to take into account the containers that 
are marked for killing (due to a previous guaranteed container that triggered 
some container killing too) but have not been killed yet.

bq. If we are using utilization to decide whether to start an opportunistic 
container, why not just look at the aggregate containers’ utilization? Granted, 
we ll be able to make a decision about only one container at a time, but we 
don’t have to iterate through container lists and do arithmetic around resource 
utilization? Even if we switch to allocation and not utilization, we should 
still be able to go off of aggregate allocation. No? And, based on what we 
decide we might not need to track ProcessTreeInfo.

Not sure I completely see what you mean. So instead of having a resource 
utilization, we would each time have to look into the utilization of the 
running and some of the queued and some of the marked-for-killing containers.
If this is what you mean, then we would not really avoid arithmetic operations 
with resource utilization, right? We might increase such operations actually.

bq. Have we considered adding a map from containerId -> ResourceUtilization to 
assist with updating aggregate utilization when a container is removed etc., so 
we don’t have to explicitly track those separately?

Is this related to your previous comment?


> Queuing of container requests in the NM
> ---------------------------------------
>
>                 Key: YARN-2883
>                 URL: https://issues.apache.org/jira/browse/YARN-2883
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Konstantinos Karanasos
>         Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-trunk.010.patch, YARN-2883-trunk.011.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to