Hi Team,

I'm using repartition and sortWithinPartitions to maintain field-based
ordering across partitions, but I'm facing data skewness among the
partitions. I have 96 partitions, and I'm working with 500 distinct keys.
While reviewing the Spark UI, I noticed that a few partitions are
underutilized while others are overutilized.

This seems to be a hashing problem. Can anyone suggest a better hashing
technique or approach to mitigate this issue?

Thanks in advance for your help.

Reply via email to