Re: Spike on number of tasks - dynamic allocation

murat migdisoglu Mon, 27 Feb 2023 06:16:26 -0800

Hey Mich,
This cluster is running spark 2.4.6 on EMR

On Mon, Feb 27, 2023 at 12:20 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:


> Hi,
>
> What is the spark version and what type of cluster is it, spark on
> dataproc or other?
>
> HTH
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 27 Feb 2023 at 09:06, murat migdisoglu <murat.migdiso...@gmail.com>
> wrote:
>
>> On an auto-scaling cluster using YARN as resource manager, we observed
>> that when we decrease the number of worker nodes after upscaling instance
>> types, the number of tasks for the same spark job spikes. (the total
>> cpu/memory capacity of the cluster remains identical)
>>
>> the same spark job, with the same spark settings (dynamic allocation is
>> on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
>> executors being allocated.
>>
>> As far as I understand, dynamic allocation decides to start a new
>> executor if it sees tasks pending being queued up. But I don't know why the
>> same spark application with identical input files runs 4-5 times higher
>> number of tasks.
>>
>> Any clues would be much appreciated, thank you.
>>
>> Murat
>>
>>

-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare

Re: Spike on number of tasks - dynamic allocation

Reply via email to