Configuration for unit testing and sql.shuffle.partitions

peay Tue, 12 Sep 2017 14:47:44 -0700

Hello,

I am running unit tests with Spark DataFrames, and I am looking for 
configuration tweaks that would make tests faster. Usually, I use a local[2] or 
local[4] master.


Something that has been bothering me is that most of my stages end up using 200 
partitions, independently of whether I repartition the input. This seems a bit 
overkill for small unit tests that barely have 200 rows per DataFrame.

spark.sql.shuffle.partitions used to control this I believe, but it seems to be 
gone and I could not find any information on what mechanism/setting replaces it 
or the corresponding JIRA.

Has anyone experience to share on how to tune Spark best for very small local 
runs like that?

Thanks!

Configuration for unit testing and sql.shuffle.partitions

Reply via email to