Re: Dealing with skew when write.distribution-mode=hash

2025-04-14 Thread namratha mk
Hi Ed, In the latest version of spark(>3.5), for both hash and range distribution mode we can control the size of partition by spark property "spark.sql.adaptive.advisoryPartitionSizeInBytes". This will control the small files problem. Regards, Namratha On Mon, Apr 7, 2025 at 8:44 AM Ed Mancebo

Re: Dealing with skew when write.distribution-mode=hash

2025-04-14 Thread Anton Okolnychyi
AQE in recent Spark versions should take care of any skew during writes. Make sure it is enabled and configured correctly. - Anton пн, 14 квіт. 2025 р. о 13:50 namratha mk пише: > Hi Ed, > > In the latest version of spark(>3.5), for both hash and range > distribution mode we can control the siz

Dealing with skew when write.distribution-mode=hash

2025-04-07 Thread Ed Mancebo
Hi all, First time posting here I’m using MERGE INTO to upsert into a table with daily partitions. More recent days tend to have many more updates, which is causing skew in the write stage when write.distribution-mode=hash (the most recent day of data will get assigned to a single task, which ta