Hi,
I am working on a spark job. It takes 10 mins for the job just for the count()
function. Question is How can I make it faster ?
From the above image, what I understood is that there 4001 tasks are running in
parallel. Total tasks are 76,553 .
Here are the parameters that I am using for the job
- master machine type - e2-standard-16
- worker machine type - e2-standard-8 (8 vcpus, 32 GB memory)
- number of workers - 400
- spark.executor.cores - 4
- spark.executor.memory - 11g
- spark.sql.shuffle.partitions - 10000
Please advice how can I make this faster ?
Thanks