Hi, I encountered some issue to run a spark SQL query, and will happy to some advice. I'm trying to run a query on a very big data set (around 1.5TB) and it getting failures in all of my tries. A template of the query is as below: insert overwrite table partition(part) select /*+ BROADCAST(c) */ *, row_number() over (partition by request_id order by economic_value DESC) row_number from ( select a,b,c,d,e from table (raw data 1.5TB)) left join small_table
The heavy part in this query is the window function. I'm using 65 spots of type 5.4x.large. The spark settings: --conf spark.driver.memory=10g --conf spark.sql.shuffle.partitions=1200 --conf spark.executor.memory=22000M --conf spark.shuffle.service.enabled=false You can see below an example of the errors that I get: [image: image.png] any suggestions? Thanks! Tzahi