Re: [apache-spark]-spark-shuffle

2020-05-24 Thread vijay.bvp
How a Spark job reads datasources depends on the underlying source system,the job configuration about number of executors and cores per executor. https://spark.apache.org/docs/latest/rdd-programming-guide.html#external-datasets About Shuffle operations.

[apache-spark]-spark-shuffle

2020-05-22 Thread Vijay Kumar
Hi, I am trying to thoroughly understand below concepts in spark. 1. A job is reading 2 files and performing a cartesian join. 2. Sizes of input are 55.7 mb and 67.1 mb 3. after reading input file, spark did shuffle, for both the inputs shuffle was in KB. I want to understand why this size is