Github user megaserg commented on the issue:
https://github.com/apache/spark/pull/20704
Thank you @dongjoon-hyun! This was also affecting our Spark job performance!
We're using `mapreduce.fileoutputcommitter.algorithm.version=2` in our
Spark job config, as recommende
Github user megaserg commented on the issue:
https://github.com/apache/spark/pull/18990
Sorry, I edited the pull request body. The @srowen's comment above was
referring to the initial version, where I proposed using default,
non-deterministic constructor for `Random()`.
---
If
GitHub user megaserg opened a pull request:
https://github.com/apache/spark/pull/18990
[SPARK-21782][Core] Repartition creates skews when numPartitions is a power
of 2
## Problem
When an RDD (particularly with a low item-per-partition ratio) is
repartitioned to numPartitions