Hi,
Have you tried
https://spark.apache.org/docs/latest/sql-performance-tuning.html#spliting-skewed-shuffle-partitions
?
Another way of handling the skew is to split the task into multiple(2 or
more) stages involving a random salt as key in the intermediate stages.
In the above case,
val maxSa
Hi Team,
I'm using repartition and sortWithinPartitions to maintain field-based
ordering across partitions, but I'm facing data skewness among the
partitions. I have 96 partitions, and I'm working with 500 distinct keys.
While reviewing the Spark UI, I noticed that a few partitions are
underutiliz