Re: Handling load distribution and addressing data skew.

2024-08-19 Thread Raghavendra Ganesh
Hi, Have you tried https://spark.apache.org/docs/latest/sql-performance-tuning.html#spliting-skewed-shuffle-partitions ? Another way of handling the skew is to split the task into multiple(2 or more) stages involving a random salt as key in the intermediate stages. In the above case, val maxSa

Handling load distribution and addressing data skew.

2024-08-16 Thread Karthick
Hi Team, I'm using repartition and sortWithinPartitions to maintain field-based ordering across partitions, but I'm facing data skewness among the partitions. I have 96 partitions, and I'm working with 500 distinct keys. While reviewing the Spark UI, I noticed that a few partitions are underutiliz