Hi all, I have pyspark sql script with loading of one table 80mb and one is 2 mb and rest 3 are small tables performing lots of joins in the script to fetch the data.
My system configuration is 4 nodes,300 GB,64 cores To write a data frame into table 24Mb size records . System is taking 4 minutes 2 sec. with parameters Driver memory -5G Executor memory-20 G Executor cores 5 Number of executors 40 Dynamicalloction.minexecutors 40 Max executors 40 Dynamic initial executors 17 Memory overhead 4G With default partition 200 Could you please any one suggest me how can I tune this code. -- k.Lakshmi Nivedita