Hi,

I am working on a spark job. It takes 10 mins for the job just for the count() 
function.  Question is How can I make it faster ?


From the above image, what I understood is that there 4001 tasks are running in 
parallel. Total tasks are 76,553 . 

Here are the parameters that I am using for the job
    - master machine type - e2-standard-16
    - worker machine type - e2-standard-8 (8 vcpus, 32 GB memory)
    - number of workers - 400 
    - spark.executor.cores - 4
    - spark.executor.memory - 11g
    - spark.sql.shuffle.partitions - 10000


Please advice how can I make this faster ? 

Thanks










  

Reply via email to