[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

Karthik Kambatla (JIRA) Sun, 10 Apr 2016 09:37:07 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234184#comment-15234184
 ]


Karthik Kambatla commented on YARN-2883:
----------------------------------------

Thanks for refactoring the patch, [~kkaranasos]. This definitely feels more 
intuitive. I had to review over multiple sittings, so might have missed some. 
Here are some comments (both high-level and nits). Will take a closer look on 
the next iteration. 

# Nit: A few lines introduced in the patch are over 80 chars. 
# Nit: When referring to fields in comments, please use <code></code> tags so 
the comments can keep up with code changes. 
# Nit: When evaluating boolean conditions in if or while statements, it makes 
for better reading if the entire sub-condition is on one line. e.g. I would 
prefer the latter. There are other places in the code that could similar minor 
adjustments.
{code}
if (resourcesToFreeUp.getPhysicalMemory() <= 0 && resourcesToFreeUp     
.getVirtualMemory() <= 0 && resourcesToFreeUp.getCPU() <= 0.0f) { }
{code} {code}
if (resourcesToFreeUp.getPhysicalMemory() <= 0 && 
    resourcesToFreeUp.getVirtualMemory() <= 0 &&
    resourcesToFreeUp.getCPU() <= 0.0f) {
{code}
# We seem to be overriding both {{performContainerStart}} and 
{{startContainerInternal}}
## It seems simpler to override only one. If we keep {{startContainerInternal}} 
private, the additional changes there can be moved to {{performContainerStart}}
## Alternatively, to keep the changes symmetric with {{stopContainerInternal}}, 
it might make sense to just override {{startContainerInternal}} and not have a 
{{performContainerStart}} method at all. 
# {{QueuingContainerManagerImpl}}
## While the abbreviated variable names do make space, they make reading the 
code slightly harder. Can we reconsider using full names? I am comfortable with 
using alloc for allocated. Also, there seems to be a mismatch in variable names 
ending in containers and requests. How about using guaranteedAllocations, 
opportunisticAllocations, queuedGuaranteedAllocations and 
queuedOpporunisticAllocations?
## Use {{ConcurrentLinkedQueue}} instead of 
{{Collections.synchronizedList(LinkedList)}}?
## I still think we should never queue guaranteed containers. Is there a good 
reason for queuing them? Since we are killing opportunistic containers anyway, 
why not do that synchronously and start the guaranteed containers? The killing 
opportunistic containers should likely take a kill context, so we can handle it 
appropriately in YARN-1011 where we kill when the utilization goes over a 
threshold. Thinking more about this, I wonder if we should just defer to the 
kill when over a threshold utilization anyway once an opportunistic container 
has started. 
## Are we queuing guaranteed containers, so a single call to kill could free up 
resources for potentially multiple guaranteed containers? 
## Nit: Rename {{removeContainerFromQueues}} to {{removeQueuedContainer}}? 
## Does {{startContainersFromQueue}} require {{resourcesAvailable}} flag to be 
passed? 
## Nit: In {{startPendingContainers}}, the code might read better as follows:
{code}
if (resourcesAvailable) {
        startContainersFromQueue(queuedOpportRequests);
}
## {{killOpportContainers}} seems to not kill opportunistic containers if they 
started running. Doesn’t this mess with guarantees on the guaranteed 
containers? Am I missing something? May be, we should just iterate in reverse 
through the queuedOpportunisticContainers and spawned opportunistic containers 
and kill as many as required? 
## QueuingApplicationEventDispatcher#handle
### Nit: the if condition could use better formatting
### Nit: The comment mentions starting pending containers *if resources 
available*, but the code there doesn’t directly reflect it.
### I wonder if we are better off using a separate thread for starting pending 
containers periodically, and would wait between attempts. The code here could 
just notify that thread? That would likely work better when YARN-1011 wants to 
use a queue length > 1. 
# {{QueuingContainersMonitorImpl}}
## When queuing containers, depending on whether oversubscription is enabled or 
not, shouldn't we be checking utilization vs allocation respectively? 
## hasAllocatedResourcesAvailable: rename to hasResourcesAvailable if we are 
not differentiating between allocation and utilization.
## hasAllocatedResourcesAvailable should check vmem only when vmem check is 
enabled? 
## {{resourcesToFreeUp}} takes {{opportContainersToKill}}, but I don't see any 
container ever being added here. Am I missing something or do we not need it 
and all related fields and method parameters can be removed? 
## If we are using utilization to decide whether to start an opportunistic 
container, why not just look at the aggregate containers’ utilization? Granted, 
we ll be able to make a decision about only one container at a time, but we 
don’t have to iterate through container lists and do arithmetic around resource 
utilization? Even if we switch to allocation and not utilization, we should 
still be able to go off of aggregate allocation. No? And, based on what we 
decide we might not need to track ProcessTreeInfo. 
## Have we considered adding a map from containerId -> ResourceUtilization to 
assist with updating aggregate utilization when a container is removed etc., so 
we don’t have to explicitly track those separately? 
## opportContainersToKill: We might want to log at INFO level instead of WARN.
# NodeManager
## Changes to createNMContext seem spurious?
# Can you provide more details on why we need to pass the execution type for 
start/stop container monitoring? 


> Queuing of container requests in the NM
> ---------------------------------------
>
>                 Key: YARN-2883
>                 URL: https://issues.apache.org/jira/browse/YARN-2883
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Konstantinos Karanasos
>         Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-trunk.010.patch, YARN-2883-yarn-2877.001.patch, 
> YARN-2883-yarn-2877.002.patch, YARN-2883-yarn-2877.003.patch, 
> YARN-2883-yarn-2877.004.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

Reply via email to