What is the no of part files in that big table? And what is the
distribution of request ID? Is the variance of the column is less or huge?
Because partitionBy clause will move data with same request ID to one
executor. If the data is huge it might put load on executor.
On Sun, 25 Aug 2019 at
Hi,
I encountered some issue to run a spark SQL query, and will happy to some
advice.
I'm trying to run a query on a very big data set (around 1.5TB) and it
getting failures in all of my tries. A template of the query is as below:
insert overwrite table partition(part)
select /*+ BROADCAST(c) */