[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906790#comment-17906790 ] ASF GitHub Bot commented on YARN-7327: -- brumi1024 merged PR #7138: URL: https://github.com/apache/hadoop/pull/7138 > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 12.987150 | 0.118169| > |20 | 13.837150 | 0.145871| > |50 | 16.816300 | 0.253437| > |100 |
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906791#comment-17906791 ] ASF GitHub Bot commented on YARN-7327: -- brumi1024 commented on PR #7138: URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2551713490 Thanks @shameersss1 for the change, @TaoYang526 @zeekling @slfan1989 for the reviews, merged to trunk. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rule them all - 144 cores >
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906782#comment-17906782 ] ASF GitHub Bot commented on YARN-7327: -- slfan1989 commented on PR #7138: URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2551591139 @shameersss1 Thank you for your contribution. From my perspective, +1. A big thanks to @brumi1024 for helping review the code. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rul
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906688#comment-17906688 ] ASF GitHub Bot commented on YARN-7327: -- shameersss1 commented on PR #7138: URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2550920952 @brumi1024 - Please let me know if there is any more concerns. Now that the dependent YARN issue is also merged, Can we take it forward ? > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905045#comment-17905045 ] Syed Shameerur Rahman commented on YARN-7327: - [~zhengchenyu] - Yes, much of your concern was valid in Older hadoop version where there were too many locks. https://issues.apache.org/jira/browse/YARN-5139 was done to fix them. The perf details can be found here - [^YARN-5139-Concurrent-scheduling-performance-report.pdf] [Reducing locking increased the sync performance as well] > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 1
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905042#comment-17905042 ] Chenyu Zheng commented on YARN-7327: [~srahman] Thanks for your reply! In 2016, our company had a 5K+ Yarn cluster, and my current company's largest Yarn cluster is only 3K+. So I missed some issues about scheduling performance. Since YARN-10352 is committed, (2) is not a problem. As for lock contention, I can only remember the general situation, which may be related to the specific version. Asynchronous scheduling means more opportunities to trigger scheduling, which means more frequent lock holding. In a super-large-scale cluster, a scheduling cycle may take a long time. Note: In 2016, our cluster network card performance was poor, so we set max.assign to 5, which resulted in a low hit rate and a scheduling cycle of nearly 20-30 seconds. However, it is still challenging to enable asynchronous scheduling on a large-scale cluster. I'm not sure about the current version, maybe some problems are no longer a problem now. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700|
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905029#comment-17905029 ] Syed Shameerur Rahman commented on YARN-7327: - [~zhengchenyu] - Thanks a lot for sharing your experience. I would like to highlight the following # The proposed change here is only for Capacity scheduler # Async scheduling is fairly stable now as compared to past many years. Thanks to the community for actively patching the bug fix > (1) Lock competition became more intense, so I thought it might be better to > solve the scheduling performance problem by improving the hit rate. Can you brief what kind of lock competition you were referring here ? > (2) Asynchronous scheduling separates the scheduling process from the >heartbeat. This means that resources will be allocated to the downed but not >timeout machine, and it must wait for the Node to time out before >re-scheduling. This was already being handled in YARN-10352 , If a node misses 2 heart beats (by default) - It is not considered for scheduling. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.18
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904995#comment-17904995 ] Chenyu Zheng commented on YARN-7327: >From my experience, I once enabled this feature on fair scheduler based on >version 2.7.1 about 2016. At that time, it was called Continuous scheduling. >After enabling the feature, scheduling performance did improve, but there were >some problems, such as: (1) Lock competition became more intense, so I thought it might be better to solve the scheduling performance problem by improving the hit rate. (2) Asynchronous scheduling separates the scheduling process from the heartbeat. This means that resources will be allocated to the downed but not timeout machine, and it must wait for the Node to time out before re-scheduling. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested contain
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904824#comment-17904824 ] ASF GitHub Bot commented on YARN-7327: -- zeekling commented on code in PR #7138: URL: https://github.com/apache/hadoop/pull/7138#discussion_r1880390597 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAsyncScheduling.java: ## @@ -927,7 +927,11 @@ public void testReleaseOutdatedReservedContainer() throws Exception { * First proposal should be accepted, second proposal should be rejected * because it try to release an outdated reserved container */ -MockRM rm1 = new MockRM(); +// disable async-scheduling for simulating complex scene +Configuration disableAsyncConf = new Configuration(conf); +disableAsyncConf.setBoolean( +CapacitySchedulerConfiguration.SCHEDULE_ASYNCHRONOUSLY_ENABLE, false); Review Comment: ok,get it. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Labels: pull-request-available > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904483#comment-17904483 ] ASF GitHub Bot commented on YARN-7327: -- shameersss1 commented on PR #7138: URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2531512162 @brumi1024 - Thanks for looking into this. > what is the reason behind changing the default of this setting? 1. The current default scheduling mechanism is synchronous (node-heart driven) which is not efficient when there are large number of containers to be allocated. 2. It also has additional issues like scheduling won't happen if there is node-heartbeat loss due to network issue . 3. @wangdatan did an amazing job of making the async scheudling production ready : Refer https://issues.apache.org/jira/browse/YARN-7327?focusedCommentId=16205259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16205259 for benchmark details. 4. The above benchmark shows async scheudling throughput is better than sync scheduling And hence the proposal here is to change the default scheduling stratergy for capacity scheduler from synchronous to asynchronous. Already companies like Alibaba cloud use this in their production https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/yarn-schedulers @brumi1024 - Do you think is there any blocker/issue in enabling it by default ? > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Assignee: Syed Shameerur Rahman >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890291#comment-17890291 ] Tao Yang commented on YARN-7327: [~srahman] Apologies for the lack of clarity earlier. I am not referring to specific issues, but rather the reaction of the scheduler when a critical thread exits unexpectedly, I think this's a potential risk for those lack-of-experience users after async-scheduling is enabled by default. I just found YARN-10058 is working on this, I would like to push that issue go ahead. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.1
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890109#comment-17890109 ] Syed Shameerur Rahman commented on YARN-7327: - [~Tao Yang] - I think your changes to fix NPE is already merged right ? Is there any other blocker to turn it ON by default. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 12.987150 | 0.118169| > |20 | 13.837150 | 0.145871| > |50 | 16.816300 | 0.253437| > |100 | 23.113450 | 0.320744| --
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890100#comment-17890100 ] Tao Yang commented on YARN-7327: Hi [~srahman]. We encountered an issue when enabling asynchronous scheduling: if an exception, such as a {{{}NullPointerException{}}}, is thrown during the scheduling process, it could cause the scheduling thread to exit unexpectedly, preventing the HA (High Availability) failover from being triggered. As a result, the ResourceManager (RM) will hang indefinitely until it is manually restarted. If we plan to enable asynchronous scheduling by default, I think it’s crucial to address this issue as well. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889564#comment-17889564 ] Shilun Fan commented on YARN-7327: -- [~srahman] Thank you for your interest in this JIRA! I'm not very familiar with Capacity Scheduler, as I use Fair Scheduler more often. However, I know some team members who are familiar with Capacity Scheduler. Feel free to contribute code, and I will do my best to connect with the relevant members to help review it. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rule them all - 144
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889563#comment-17889563 ] Syed Shameerur Rahman commented on YARN-7327: - [~leftnoteasy] - Given that asynchronous scheduling is comparatively stable now, anythoughts on enabling it by default ? Do you forsee any issues ? If not i can pick the work to make it by default. cc: [~slfan1989] > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 12.987150 | 0.118169| > |2
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281840#comment-16281840 ] Craig Ingram commented on YARN-7327: I finally got around to trying out asynchronous container allocation in Hadoop 2.9 and 3.0-SNAPSHOT (built from master a few days ago) with Spark 2.3-SNAPSHOT (built same day as Hadoop). This is all running on the same hardware described above (I did not repeat the tests on VMs). The test results are attached as is the jupyter notebook I used to create it. I did change the test from what was done above slightly by tweaking the core counts requested each round. It's now requesting 16, 32, 64, 128, and 256 whereas it was requesting 2, 20, 50, and 100 before. I reran the 2.7.3 tests as well. I also ran the 2.9 test with 4 threads and it came out basically the same as the 3.0 test with 4 threads; therefore I did not include it in the graphs. ||Legend||Test|| |sync3|synchronous 3.0-SNAPSHOT| |sync29|synchronous 2.9| |sync273|synchronous 2.7.3| |async1-3|async with 1 thread on 3.0-SNAPSHOT| |async1-29|async with 1 thread on 2.9| |async1-273|async with 1 thread on 2.7.3| |async2-3|async with 2 threads on 3.0-SNAPSHOT| |async4-3|async with 4 threads on 3.0-SNAPSHOT| |async8-3|async with 8 threads on 3.0-SNAPSHOT| |async16-3|async with 16 threads on 3.0-SNAPSHOT| [^async-scheduling-results.md] [^schedule-async.png] [^spark-on-yarn-schedule-async.ipynb] While the numbers aren't as great as I was hoping (especially at higher thread pool counts), it's still a big improvement. I was mainly surprised by the flattening out of containers allocations per second at higher container counts. I was thinking of giving the RM more memory or at least looking into whether it is under GC pressure. Is there anywhere else I should look to tune this? Thanks! > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50G
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205259#comment-16205259 ] Wangda Tan commented on YARN-7327: -- [~CraigI], If you wanna try the latest async scheduling in Capacity Scheduler, you don't need to change application code. It is a global scheduler config in capacity-scheduler.xml: {code} yarn.scheduler.capacity.schedule-asynchronously.enable true {code} In addition to that, the 2.9.0/3.0.0 Yarn support specify multiple thread (by default is 1) to allocate containers. {code} yarn.scheduler.capacity.schedule-asynchronously.maximum-threads 4 {code} >From the test report: >https://issues.apache.org/jira/secure/attachment/12831662/YARN-5139-Concurrent-scheduling-performance-report.pdf > the multiple thread + async approach can improve scheduler throughput (and >shorten allocation delays) significantly. Please let me know how it goes in your side, I can help to answer questions if you have. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > ||
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204359#comment-16204359 ] Eric Yang commented on YARN-7327: - The purpose of the artificial synchronous delay was to ensure all nodes have the same opportunity to talk to Jobtracker or Application Master. This ensures there is somewhat evenly distribution of workload on all nodes instead of faster node end up with majority of data. With faster network, and lower latency, it might be reasonable to shorten the heartbeat frequency to improve container allocation response time. Asynchronous allocation will guarantee that fastest node end up with most data. I don't have preference in the default, but large scale cluster is likely to use synchronous delay to prevent large skew of data in cluster. > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitti
[jira] [Commented] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default
[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204352#comment-16204352 ] Craig Ingram commented on YARN-7327: Thanks Arun. I'll look into what it'll take to get a test environment setup with the latest YARN. I'm not sure if Spark will require any modifications to try it out at this point. I believe I can setup some benchmarks to demonstrate if there is any impact when the cluster is under load. It would be using Spark, so I'm not sure if that would help the general YARN use case. I like the idea of opportunistic containers, but I think the way Spark's scheduler farms out tasks is already doing something similar (I'll take a closer look though). > CapacityScheduler: Allocate containers asynchronously by default > > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Craig Ingram >Priority: Trivial > Attachments: yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name: QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.57 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505