Ok, I did a bit of a test that shows that the shuffle does spill to memory
then to disk if my assertion is valid.
The sample code I wrote is as follows:
import sys
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import
Hi,
On the issue of Spark shuffle it is accepted that shuffle *often involves*
the following if not all below:
- Disk I/O
- Data serialization and deserialization
- Network I/O
Excluding external shuffle service and without relying on the configuration
options provided by spark for