[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3136: - Fix Version/s: 2.8.0 > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Labels: 2.7.2-candidate > Fix For: 2.8.0, 2.7.2, 3.0.0-alpha1 > > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 00013-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch, YARN-3136.branch-2.7.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3136: -- Fix Version/s: (was: 2.8.0) > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Labels: 2.7.2-candidate > Fix For: 2.7.2 > > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 00013-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch, YARN-3136.branch-2.7.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3136: - Attachment: YARN-3136.branch-2.7.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Labels: 2.7.2-candidate > Fix For: 2.8.0, 2.7.2 > > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 00013-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch, YARN-3136.branch-2.7.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3136: - Fix Version/s: 2.7.2 > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Labels: 2.7.2-candidate > Fix For: 2.8.0, 2.7.2 > > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 00013-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3136: - Labels: 2.7.2-candidate (was: ) > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Labels: 2.7.2-candidate > Fix For: 2.8.0 > > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 00013-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3136: -- Attachment: 00013-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 00013-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 00012-YARN-3136.patch Rebased against trunk. Also changed the findbugs suppression for getTransferredContainers method. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 00012-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, > 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 00011-YARN-3136.patch Checking jenkins again > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, > 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, > 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: (was: 00011-YARN-3136.patch) > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, > 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, > 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 00011-YARN-3136.patch Thank you [~jianhe] I have added the check in exclude file. Will kick in jenkins with this patch. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 00011-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, > 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, > 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 00010-YARN-3136.patch Yes [~jianhe] I added that to fix findbugs which is not needed. I updated patch as per initial understanding. Kindly check. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, > 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, > 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, > 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0009-YARN-3136.patch Hi [~jlowe] and [~jianhe] I used ConcurrentMap for 'applications'. But findbugs warnings are coming for non-synchronized access on this map. Hope that is acceptable, pls share your opinion. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, > 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0008-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0007-YARN-3136.patch Uploading patch to check findbugs warnings. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch, 0007-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0006-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, > 0006-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0005-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0004-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0003-YARN-3136.patch Yes [~jlowe]. Its good to keep the backward compatibility. bq. can be overridden in derived schedulers A new method named *getSchedulerApplication* can be added in AbstractYarnScheduler and it can come with lock by default to access application object from *applications* map. Later in CS or other scheduler, we can override to remove the lock. I attached a patch on this. Please see whether its same as you mentioned. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0002-YARN-3136.patch Hi [~jlowe] [~jianhe] *applications* map is made and ConcurrentMap and can thus enforce concurrency. However as mentioned in previous comments, this can cause issues for existing custom schedulers which doesnt use ConcurrentMap. Pls share your comments. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0001-YARN-3136.patch Attaching a patch as discussed. Kindly check. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3136: - Issue Type: Sub-task (was: Bug) Parent: YARN-3091 > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)