I have not seen stack traces under autoscaling, so not even sure what the error in question is. There is always delay in acquiring a whole new executor in the cloud as it usually means a new VM is provisioned. Spark treats the new executor like any other, available for executing tasks.
On Fri, Feb 4, 2022 at 4:28 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks for the info. > > My concern has always been on how Spark handles autoscaling (adding new > executors) when the load pattern changes.I have tried to test this with > setting the following parameters (Spark 3.1.2 on GCP) > > spark-submit --verbose \ > ....... > --conf spark.dynamicAllocation.enabled="true" \ > --conf spark.shuffle.service.enabled="true" \ > --conf spark.dynamicAllocation.minExecutors=2 \ > --conf spark.dynamicAllocation.maxExecutors=10 \ > --conf spark.dynamicAllocation.initialExecutors=4 \ > > It is not very clear to me how Spark distributes tasks on the added > executors and the source of delay. As you have observed there is a delay in > adding new resources and allocating tasks. If that process is efficient? > > Thanks > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 4 Feb 2022 at 03:04, Maksim Grinman <m...@resolute.ai> wrote: > >> It's actually on AWS EMR. The job bootstraps and runs fine -- the >> autoscaling group is to bring up a service that spark will be calling. Some >> code waits for the autoscaling group to come up before continuing >> processing in Spark, since the Spark cluster will need to make requests to >> the service in the autoscaling group. It takes several minutes for the >> service to come up, and during the wait, Spark starts to show these thread >> dumps, as presumably it thinks something is wrong since the executor is >> busy waiting and not doing anything. The previous version of Spark did not >> do this (2.4.4). >> >> On Thu, Feb 3, 2022 at 6:59 PM Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Sounds like you are running this on Google Dataproc cluster (spark >>> 3.1.2) with auto scaling policy? >>> >>> Can you describe if this happens before Spark starts a new job on the >>> cluster or somehow half way through processing an existing job? >>> >>> Also is the job involved doing Spark Structured Streaming? >>> >>> HTH >>> >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Thu, 3 Feb 2022 at 21:29, Maksim Grinman <m...@resolute.ai> wrote: >>> >>>> We've got a spark task that, after some processing, starts an >>>> autoscaling group and waits for it to be up before continuing processing. >>>> While waiting for the autoscaling group, spark starts throwing full thread >>>> dumps, presumably at the spark.executor.heartbeat interval. Is there a way >>>> to prevent the thread dumps? >>>> >>>> -- >>>> Maksim Grinman >>>> VP Engineering >>>> Resolute AI >>>> >>> >> >> -- >> Maksim Grinman >> VP Engineering >> Resolute AI >> >