Hi Team,
I'm using repartition and sortWithinPartitions to maintain field-based
ordering across partitions, but I'm facing data skewness among the
partitions. I have 96 partitions, and I'm working with 500 distinct keys.
While reviewing the Spark UI, I noticed that a few partitions are
underutiliz
Hi Shay,
Let me address the points you raised using the STAR methodology. I
apologize if it sounds a bit formal, but I find it effective for clarity.
*Situation*
You encountered an issue while working with a Spark DataFrame where a
shuffle was unexpectedly triggered during the application of a w
Hi Mich, thank you for answering - much appreciated.
This can cause uneven distribution of data, triggering a shuffle for the window
function.
Could you elaborate on the mechanism that can "trigger a shuffle for the window
function"? I'm not familiar with it. (or are you referring to AQE?)
In an
Hi Team,
I am trying to add a shutdown hook with the pyspark script using `*atexit*`.
However, it seems like whenever I send a SIGTERM to the spark-submit
process, it triggers the JVM shutdown hook first which results in
terminating the spark context.
I didn't understand in what order the python