Hey Mich, This cluster is running spark 2.4.6 on EMR On Mon, Feb 27, 2023 at 12:20 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> Hi, > > What is the spark version and what type of cluster is it, spark on > dataproc or other? > > HTH > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 27 Feb 2023 at 09:06, murat migdisoglu <murat.migdiso...@gmail.com> > wrote: > >> On an auto-scaling cluster using YARN as resource manager, we observed >> that when we decrease the number of worker nodes after upscaling instance >> types, the number of tasks for the same spark job spikes. (the total >> cpu/memory capacity of the cluster remains identical) >> >> the same spark job, with the same spark settings (dynamic allocation is >> on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more >> executors being allocated. >> >> As far as I understand, dynamic allocation decides to start a new >> executor if it sees tasks pending being queued up. But I don't know why the >> same spark application with identical input files runs 4-5 times higher >> number of tasks. >> >> Any clues would be much appreciated, thank you. >> >> Murat >> >> -- "Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues." W. Shakespeare