Hi, I am working on a spark job. It takes 10 mins for the job just for the count() function. Question is How can I make it faster ?
From the above image, what I understood is that there 4001 tasks are running in parallel. Total tasks are 76,553 . Here are the parameters that I am using for the job - master machine type - e2-standard-16 - worker machine type - e2-standard-8 (8 vcpus, 32 GB memory) - number of workers - 400 - spark.executor.cores - 4 - spark.executor.memory - 11g - spark.sql.shuffle.partitions - 10000 Please advice how can I make this faster ? Thanks