[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107329#comment-17107329 ] David Mollitor commented on YARN-9863: -- Hey Team, What is the best way to make progress on this? > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007523#comment-17007523 ] David Mollitor commented on YARN-9951: -- [~szegedim] [~adam.antal] Can you please review? > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch, YARN-9951.2.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9951: - Attachment: YARN-9951.2.patch > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch, YARN-9951.2.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995077#comment-16995077 ] David Mollitor commented on YARN-9951: -- [~adam.antal] [~pbacsko] All great suggestions. Thanks! Provided a new patch. Please consider for inclusion. > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch, YARN-9951.2.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995073#comment-16995073 ] David Mollitor commented on YARN-9863: -- [~pbacsko] Thank you for weighing in. You are correct. I do not have anything at the correct scale to test this. The customer I am working with is using CDH and does not have a process to patch, deploy, test, in production. I think the risk is low of it causing more harm than it might benefit. > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10024) TestReplicationStatus Suspicious Assertion
David Mollitor created YARN-10024: - Summary: TestReplicationStatus Suspicious Assertion Key: YARN-10024 URL: https://issues.apache.org/jira/browse/YARN-10024 Project: Hadoop YARN Issue Type: Task Reporter: David Mollitor {code:java|TestReplicationStatus.java} ServerMetrics sm = metrics.getLiveServerMetrics().get(server); List rLoadSourceList = sm.getReplicationLoadSourceList(); // check SourceList still only has one entry assertTrue("failed to get ReplicationLoadSourceList", (rLoadSourceList.size() == 2)); assertEquals(PEER_ID2, rLoadSourceList.get(0).getPeerID()); {code} Claims that the sourceList should only have one entry and asserts that the one entry is {{PEER_ID2}} but it also asserts that the list should have two items in it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991884#comment-16991884 ] David Mollitor commented on YARN-9951: -- [~szegedim] > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969446#comment-16969446 ] David Mollitor commented on YARN-9951: -- [~szegedim] Can you please take a peek at this one too? :) > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9951: - Description: [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] Has several different ways for reporting errors: # Couldn't # Can't # Could not # Failed to # Unable to # Other I think "Failed to" is the best verbage. Contractions are hard for non-native English speaking folks. "Failed" is to the point. and more likely to grep logs for 'fail' than I am 'unable' or 'could not'. was: [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] Has several different ways for reporting errors: # Couldn't # Can't # Failed to # Unable to # Other I think "Failed to" is the best verbage. Contractions are hard for non-native English speaking folks. "Failed" is to the point. and more likely to grep logs for 'fail' than I am 'unable' or 'could not'. > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9951: - Attachment: YARN-9951.1.patch > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9951: - Summary: Unify Error Messages in container-executor (was: Unify Error Message in container-executor) > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9951) Unify Error Message in container-executor
David Mollitor created YARN-9951: Summary: Unify Error Message in container-executor Key: YARN-9951 URL: https://issues.apache.org/jira/browse/YARN-9951 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] Has several different ways for reporting errors: # Couldn't # Can't # Failed to # Unable to # Other I think "Failed to" is the best verbage. Contractions are hard for non-native English speaking folks. "Failed" is to the point. and more likely to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957077#comment-16957077 ] David Mollitor commented on YARN-9863: -- [~szegedim] Any chance you've been able to review my remarks? Thanks! > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949769#comment-16949769 ] David Mollitor commented on YARN-9863: -- [~szegedim] Thank you for your feedback. The background here is that I am working with a large cluster that has one job in particular that is crushing it. This one job is required to localize many resources, of varying file sizes, for the job to complete. As I understand YARN, when a job is submitted to the cluster, a list of files to localize is sent to each NodeManager involved in the job. In this case, all nodes are involved. All NodeManagers receive a carbon copy of the list of files from the ResourceManager (or maybe it's the 'yarn' client?). That is, they all have the same list, with the same ordering. The NodeManager then iterate through the list and request that each file be localized. So, it would seem to me that all of the NodeManagers would request from HDFS file1, file2, file3, ... This would have a stampeding affect on the HDFS DataNodes. I am familiar with {{mapreduce.client.submit.file.replication}}. I understand that this is used to pump-up the replication of the submitted files so that they are available on more DataNodes. However, the way that it works, as I understand it, is that the file is first written to the HDFS cluster with the default replication (usually 3), and then the client requests that the file be replicated up to the final size in a separate request (setrep). This replication process happens asynchronously. If the {{mapreduce.client.submit.file.replication}} is set to 10, for example, the job may be submitted and finished before the file actually achieves a final replication of 10. This becomes exacerbated on larger clusters. If a cluster has 1,000 nodes, the recommended value of {{mapreduce.client.submit.file.replication}} is sqrt(1000) or ~32. The default number of connections each DataNode can support is 10 ({{dfs.datanode.handler.count}}). So, even if the desired replication is achieved, that is 32 x 10 connections = 320 connections supported at once. In a cluster with 1,000 nodes, that is going to stall. By simply randomizing the list, the load can be spread across many sets of 32 nodes and better support this scenario. For your questions: # I'm not sure how HDFS would manage this. The requests are generated by the NodeManagers and the HDFS cluster is simply serving. They have no way to randomize the requests. # SecureRandom. This is not a secure operation. It only requires a fast and pretty-good randomization of the list to spread the load # I believe that the parallel nature of the localization is configurable with {{yarn.nodemanager.localizer.fetch.thread-count}} (default 4), but I believe that the requests are submitted to a work-queue in order, so there will still be some level of trampling, especially if there are more than 4 files to localize (as is this case with the scenario I am reviewing) > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9863: - Attachment: YARN-9863.2.patch > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9863: - Attachment: YARN-9863.1.patch > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Moved] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor moved MAPREDUCE-7243 to YARN-9863: - Component/s: (was: performance) (was: nodemanager) nodemanager Key: YARN-9863 (was: MAPREDUCE-7243) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Attachment: YARN-9846.3.patch > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch, YARN-9846.2.patch, YARN-9846.3.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Attachment: YARN-9846.2.patch > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch, YARN-9846.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Attachment: (was: YARN-9846.2.patch) > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Attachment: YARN-9846.2.patch > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch, YARN-9846.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Attachment: YARN-9846.1.patch > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Summary: Use Finer-Grain Synchronization in ResourceLocalizationService (was: User Finer-Grain Synchronization in ResourceLocalizationService) > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) User Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Summary: User Finer-Grain Synchronization in ResourceLocalizationService (was: User Finer-Grain Synchronization in ResourceLocalizationService.java) > User Finer-Grain Synchronization in ResourceLocalizationService > --- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) User Finer-Grain Synchronization in ResourceLocalizationService.java
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Summary: User Finer-Grain Synchronization in ResourceLocalizationService.java (was: User Fineer-Grain Synchronization in ResourceLocalizationService.java) > User Finer-Grain Synchronization in ResourceLocalizationService.java > > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9846) User Fineer-Grain Synchronization in ResourceLocalizationService.java
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Flags: Patch > User Fineer-Grain Synchronization in ResourceLocalizationService.java > - > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9846) User Fineer-Grain Synchronization in ResourceLocalizationService.java
David Mollitor created YARN-9846: Summary: User Fineer-Grain Synchronization in ResourceLocalizationService.java Key: YARN-9846 URL: https://issues.apache.org/jira/browse/YARN-9846 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 # Remove these synchronization blocks # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9845) Update LocalResourcesTrackerImpl to Use Java 8 Map Concurrent API
[ https://issues.apache.org/jira/browse/YARN-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9845: - Attachment: YARN-9845.1.patch > Update LocalResourcesTrackerImpl to Use Java 8 Map Concurrent API > - > > Key: YARN-9845 > URL: https://issues.apache.org/jira/browse/YARN-9845 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9845.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L467 > Class is using a {{ConcurrentHashMap}} but is not taking advantage of it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9845) Update LocalResourcesTrackerImpl to Use Java 8 Map Concurrent API
[ https://issues.apache.org/jira/browse/YARN-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9845: - Summary: Update LocalResourcesTrackerImpl to Use Java 8 Map Concurrent API (was: Update to Use Java 8 Map Concurrent API) > Update LocalResourcesTrackerImpl to Use Java 8 Map Concurrent API > - > > Key: YARN-9845 > URL: https://issues.apache.org/jira/browse/YARN-9845 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L467 > Class is using a {{ConcurrentHashMap}} but is not taking advantage of it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9845) Update to Use Java 8 Map Concurrent API
David Mollitor created YARN-9845: Summary: Update to Use Java 8 Map Concurrent API Key: YARN-9845 URL: https://issues.apache.org/jira/browse/YARN-9845 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L467 Class is using a {{ConcurrentHashMap}} but is not taking advantage of it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9485) Add Target Cache Size to Cache Cleaner Output
[ https://issues.apache.org/jira/browse/YARN-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9485: - Attachment: YARN-9485.1.patch > Add Target Cache Size to Cache Cleaner Output > - > > Key: YARN-9485 > URL: https://issues.apache.org/jira/browse/YARN-9485 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0, 2.9.2 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9485.1.patch > > > Would have been helpful when looking at the logs, trying to track down an > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9485) Add Target Cache Size to Cache Cleaner Output
David Mollitor created YARN-9485: Summary: Add Target Cache Size to Cache Cleaner Output Key: YARN-9485 URL: https://issues.apache.org/jira/browse/YARN-9485 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.9.2, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9485) Add Target Cache Size to Cache Cleaner Output
[ https://issues.apache.org/jira/browse/YARN-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9485: - Description: Would have been helpful when looking at the logs, trying to track down an issue. > Add Target Cache Size to Cache Cleaner Output > - > > Key: YARN-9485 > URL: https://issues.apache.org/jira/browse/YARN-9485 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0, 2.9.2 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > Would have been helpful when looking at the logs, trying to track down an > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6139) There are no docs for file localization
[ https://issues.apache.org/jira/browse/YARN-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818281#comment-16818281 ] David Mollitor commented on YARN-6139: -- https://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/ > There are no docs for file localization > --- > > Key: YARN-6139 > URL: https://issues.apache.org/jira/browse/YARN-6139 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Priority: Major > Labels: documentation > > File localization is a major part of YARN and how it runs applications. The > localization process is completely undocumented aside from the > {{o.a.h.filecache.DistributedCache}} (MR1) API docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9347) Do not retry containers killed for "running beyond physical memory limits"
[ https://issues.apache.org/jira/browse/YARN-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797177#comment-16797177 ] David Mollitor commented on YARN-9347: -- [~starphin] Check out [MAPREDUCE-7180] > Do not retry containers killed for "running beyond physical memory limits" > -- > > Key: YARN-9347 > URL: https://issues.apache.org/jira/browse/YARN-9347 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Reporter: David Mollitor >Priority: Minor > > If a container is killed for "running beyond physical memory limits," then do > not re-launch this container. Simply exit the application. Re-trying the > application on another node will be fruitless. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9347) Do not retry containers killed for "running beyond physical memory limits"
David Mollitor created YARN-9347: Summary: Do not retry containers killed for "running beyond physical memory limits" Key: YARN-9347 URL: https://issues.apache.org/jira/browse/YARN-9347 Project: Hadoop YARN Issue Type: Improvement Components: applications Reporter: David Mollitor If a container is killed for "running beyond physical memory limits," then do not re-launch this container. Simply exit the application. Re-trying the application on another node will be fruitless. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org