Tuning spark job to make count faster.

Krishna Chakka Mon, 05 Apr 2021 23:59:34 -0700

Hi,

I am working on a spark job. It takes 10 mins for the job just for the count() 
function.  Question is How can I make it faster ?



From the above image, what I understood is that there 4001 tasks are running in 
parallel. Total tasks are 76,553 . 

Here are the parameters that I am using for the job
    - master machine type - e2-standard-16
    - worker machine type - e2-standard-8 (8 vcpus, 32 GB memory)
    - number of workers - 400 
    - spark.executor.cores - 4
    - spark.executor.memory - 11g
    - spark.sql.shuffle.partitions - 10000


Please advice how can I make this faster ? 

Thanks

Tuning spark job to make count faster.

Reply via email to