Hi All, I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions).
I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 and numPartitions 12. However, when I run the job I see that the stage which reads the directory has only 8 tasks. So some task reads more than one file and takes twice the time. What can I do that the files are read by 12 tasks I.e one file per task. Thanks, Pradeep --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org