[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4697: - Target Version/s: 2.6.4, 2.8.0, 2.7.3 > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Fix For: 2.9.0 > > Attachments: yarn4697.001.patch, yarn4697.002.patch, > yarn4697.003.patch, yarn4697.004.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4697: - Priority: Critical (was: Major) > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Fix For: 2.9.0 > > Attachments: yarn4697.001.patch, yarn4697.002.patch, > yarn4697.003.patch, yarn4697.004.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4697: - Attachment: yarn4697.004.patch New unit tests added for invalid values. Other comments addressed as well > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch, yarn4697.002.patch, > yarn4697.003.patch, yarn4697.004.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4697: - Attachment: yarn4697.003.patch Thanks very much for you guys' comments. I have updated the patch accordingly. > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch, yarn4697.002.patch, > yarn4697.003.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4697: Assignee: Haibo Chen (was: Naganarasimha G R) > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch, yarn4697.002.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4697: - Attachment: yarn4697.002.patch > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch, yarn4697.002.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4697: - Attachment: yarn4697.001.patch > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)