Hello, I am running unit tests with Spark DataFrames, and I am looking for configuration tweaks that would make tests faster. Usually, I use a local[2] or local[4] master.
Something that has been bothering me is that most of my stages end up using 200 partitions, independently of whether I repartition the input. This seems a bit overkill for small unit tests that barely have 200 rows per DataFrame. spark.sql.shuffle.partitions used to control this I believe, but it seems to be gone and I could not find any information on what mechanism/setting replaces it or the corresponding JIRA. Has anyone experience to share on how to tune Spark best for very small local runs like that? Thanks!