[ 
https://issues.apache.org/jira/browse/YARN-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500300#comment-13500300
 ] 

Robert Joseph Evans commented on YARN-162:
------------------------------------------

Sid I like the patch.  I have a few minor comments

# there are a few TODOs added into the code.  {code}// TODO This is broken. 
Container ID for the AM may not be 1.{code}, {code}// TODO Should the app 
really fail if log aggregation fails ?{code} and {code}// TODO Send out an 
event to the app. Currently since aggregation failure{code}.  I could not find 
an existing JIRA for the first one so please file one for it.  The other two 
seem to be related to one another.  If you feel strongly that we should not 
fail an application because log aggregation will not work, then please file a 
separate JIRA for that, otherwise the TODOs should just be comments and not 
TODOs.
# I don't really like the name of the new config that was added.  It exposes 
the internal implementation of how we throttle the applications.  I would 
prefer to have it called something like 
yarn.nodemanager.log-aggregation.max-concurrent-apps.  But this is very minor.
# The new config was not added to yarn-default.xml
# This is also very minor. Inside LogAggregationService.stopApp we are wrapping 
a Void callable inside another Void callable.  I would prefer it if we returned 
the original value instead of returning null.

With Jenkin's +1 I am OK with the change, but it is a large enough change that 
I am a bit nervous about pulling this into 0.23.5.  If you are OK with this, I 
will pull in a modified YARN-219 that addresses your comments, and then we can 
pull this into trunk, branch-2, and branch-0.23 (0.23.6)
                
> nodemanager log aggregation has scaling issues with namenode
> ------------------------------------------------------------
>
>                 Key: YARN-162
>                 URL: https://issues.apache.org/jira/browse/YARN-162
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Nathan Roberts
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: YARN-162.txt, YARN-162_WIP.txt
>
>
> Log aggregation causes fd explosion on the namenode. On large clusters this 
> can exhaust FDs to the point where datanodes can't check-in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to