Hi Team, Sample Merge query:
df.createOrReplaceTempView("source") MERGE INTO iceberg_hive_cat.iceberg_poc_db.iceberg_tab target USING (SELECT * FROM source) ON target.col1 = source.col1// this is my bucket column WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * The source dataset is a temporary view and it contains 1.5 million records in future can 20 Million rows and with id that have 16 buckets. The target iceberg table has 16 buckets . The source dataset will only update if matched and insert if not matched with those id I have 1700 columns in my table. spark dataset is using default partitioning , do we need to bucket the spark dataset on bucket column as well ? Let me know if you need any further details. it fails with OOME , Regards Gaurav