[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425430#comment-15425430 ] Karthik Kambatla commented on YARN-5479: I expect some performance improvements come by way of global scheduling (YARN-5139), where we are considering using threadpools for some parallelizable operations. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422033#comment-15422033 ] He Tianyi commented on YARN-5479: - Yes. And It's a vast improvement. I simulated a scenario similar to production (with 10s of queues, hundres of running apps) and benchmark showing 2x more faster for heartbeat processing. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422002#comment-15422002 ] Xianyin Xin commented on YARN-5479: --- [~He Tianyi], hope YARN-4090 can provide some information, in which the locked resourceusage was snapshoted and such the performance was improved greatly. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420796#comment-15420796 ] sandflee commented on YARN-5479: will do, thx > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420549#comment-15420549 ] He Tianyi commented on YARN-5479: - Good point [~sandflee]. Would you share some performance evaluation based on that? Thanks. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419963#comment-15419963 ] sandflee commented on YARN-5479: seems no need to compute minShare/isNeed/MinShareRatio/UseToWeightRatio in every comparator#compute, we could snapshot these before do real sort. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415975#comment-15415975 ] Ray Chiang commented on YARN-5479: -- Sorry for the delay in replying. YARN-5047 has the lead-in discussion, but YARN-5283 is refactoring the container scheduling into one method as well. And I'm fine with an umbrella JIRA. The more we break this up into individual features, the easier it will be to cherry-pick and judge impact on specific changes. I'd just be aware of conflicts with the entirety of the refactoring planned at YARN-5046. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413653#comment-15413653 ] Jason Lowe commented on YARN-5479: -- bq. While doing so does not seemly cause any problem in production (fairness is slightly damaged locally, but within acceptable range. What is acceptable for your production may not be acceptable to others. We're changing the requirements, and that could have ramifications for some users. It's hard to say, which is why I'd rather avoid going there unless absolutely necessary. bq. Shall we make this issue an umbrella? Yep, seems an appropriate place to gather performance improvements, although as mentioned above some of these may not be (or should not be) specific to the FairScheduler. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412817#comment-15412817 ] He Tianyi commented on YARN-5479: - Thanks for comments. [~rchiang]. [~jlowe]. bq. I'd be careful with having multiple implementations or multiple APIs for doing the same thing with Resource. Resource is used a lot of places in the Hadoop codebase and this could add confusion, even with accurate Javadocs. Yes, multiple implementations would be confusing. I tried to replace {{ResourcePBImpl}} directly with the implementation I mentioned and looks like no other issue is raised. Maybe we could still stick to single version of implementation by making it faster. bq. The nodeUpdate() changes will conflict with YARN-5047 unless you plan on doing the same changes for CapacityScheduler and FifoScheduler. Most changes can be done in {{attemptScheduling}}, which is dedicated to FairScheduler. So perhaps we can keep it that way. bq. Minimally I think we should approach this as two (or more) separate JIRAs since there are two vastly different approaches to improving performance here. Agreed. Will fill separate JIRAs to address each aspect. bq. I don't think we should start loosening the guarantees of the scheduler for performance reasons until we've exhausted the other ways we can improve performance Certainly. However, the approach would be quite simple for implementing. While doing so does not seemly cause any problem in production (fairness is slightly damaged locally, but within acceptable range. and there is no effect globally. though not carefully investigated yet). So if one must figure out how to balance between resource utilization and fairness (since resource costs), providing such option (e.g. through configuration) may be viable. Shall we make this issue an umbrella? There are still many approaches to deliver better performance in FairScheduler. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412177#comment-15412177 ] Jason Lowe commented on YARN-5479: -- Agree the proposals are interesting. I'd love to get the overhead of Resource reduced, since as you and Ray point out it's used everywhere. Minimally I think we should approach this as two (or more) separate JIRAs since there are two vastly different approaches to improving performance here. One is optimizing the existing algorithm while the other is proposing to change the requirements to allow more optimization. I don't think we should start loosening the guarantees of the scheduler for performance reasons until we've exhausted the other ways we can improve performance. So personally I'd rather see the Resource-related improvements before the others that change the guarantees to which users have grown accustomed. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411121#comment-15411121 ] Ray Chiang commented on YARN-5479: -- What you have in mind sounds interesting. I'd have to look at parts of the codebase more to comment further, but just some food for thought: - I'd be careful with having multiple implementations or multiple APIs for doing the same thing with Resource. Resource is used a lot of places in the Hadoop codebase and this could add confusion, even with accurate Javadocs. - The nodeUpdate() changes will conflict with YARN-5047 unless you plan on doing the same changes for CapacityScheduler and FifoScheduler. > FairScheduler: Scheduling performance improvement > - > > Key: YARN-5479 > URL: https://issues.apache.org/jira/browse/YARN-5479 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > > Currently ResourceManager uses a single thread to handle async events for > scheduling. As number of nodes grows, more events need to be processed in > time in FairScheduler. Also, increased number of applications & queues slows > down processing of each single event. > There are two cases that slow processing of nodeUpdate events is problematic: > A. global throughput is lower than number of nodes through heartbeat rounds. > This keeps resource from being allocated since the inefficiency. > B. global throughput meets the need, but for some of these rounds, events of > some nodes cannot get processed before next heartbeat. This brings > inefficiency handling burst requests (i.e. newly submitted MapReduce > application cannot get its all task launched soon given enough resource). > Pretty sure some people will encounter the problem eventually after a single > cluster is scaled to several K of nodes (even with {{assignmultiple}} > enabled). > This issue proposes to perform several optimization towards performance in > FairScheduler {{nodeUpdate}} method. To be specific: > A. trading off fairness with efficiency, queue & app sorting can be skipped > (or should this be called 'delayed sorting'?). we can either start another > dedicated thread to do the sorting & updating, or actually perform sorting > after current result have been used several times (say sort once in every 100 > calls.) > B. performing calculation on {{Resource}} instances is expensive, since at > least 2 objects ({{ResourceImpl}} and its proto builder) is created each time > (using 'immutable' apis). the overhead can be eliminated with a > light-weighted implementation of Resource, which do not instantiate a builder > until necessary, because most instances are used as intermediate result in > scheduler instead of being exchanged via IPC. Also, {{createResource}} is > using reflection, which can be replaced by a plain {{new}} (for scheduler > usage only). furthermore, perhaps we could 'intern' resource to avoid > allocation. > C. other minor changes: such as move {{updateRootMetrics}} call to > {{update}}, making root queue metrics eventual consistent (which may > satisfies most of the needs). or introduce counters to {{getResourceUsage}} > and make changing of resource incrementally instead of recalculate each time. > With A and B, I was looking at 4 times improvement in a cluster with 2K nodes. > Suggestions? Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org