How a Spark job reads datasources depends on the underlying source system,the
job configuration about number of executors and cores per executor.
https://spark.apache.org/docs/latest/rdd-programming-guide.html#external-datasets
About Shuffle operations.
Hi,
I am trying to thoroughly understand below concepts in spark.
1. A job is reading 2 files and performing a cartesian join.
2. Sizes of input are 55.7 mb and 67.1 mb
3. after reading input file, spark did shuffle, for both the inputs
shuffle was in KB. I want to understand why this size is