Hi,

I encountered some issue to run a spark SQL query, and will happy to some
advice.
I'm trying to run a query on a very big data set (around 1.5TB) and it
getting failures in all of my tries. A template of the query is as below:
insert overwrite table partition(part)
select  /*+ BROADCAST(c) */
 *, row_number() over (partition by request_id order by economic_value
DESC) row_number
from (
select a,b,c,d,e
from table (raw data 1.5TB))
left join small_table

The heavy part in this query is the window function.
I'm using 65 spots of type 5.4x.large.
The spark settings:
--conf spark.driver.memory=10g
--conf spark.sql.shuffle.partitions=1200
--conf spark.executor.memory=22000M
--conf spark.shuffle.service.enabled=false


You can see below an example of the errors that I get:
[image: image.png]


any suggestions?



Thanks!
Tzahi

Reply via email to