I have no other advice for this. Does the situation improve after parameter
configuration?
Jason Jun 于2022年8月5日周五 06:55写道:
> Hi Qian,
>
> Thanks for your feedback. We're using spark ver 3.1.2, these are set :
>
> spark.ui.retainedJobs 10
> spark.ui.retainedStages 10
> spark.ui.retainedTasks 100
thats good point about skewness and potential join optimizations. i will
try turning off all skew optimizations, and force a sort-merge-join, and
see if it then re-uses shuffle files on the static side.
unfortunately my static side is too large to broadcast. the streaming side
can be broadcasted
I suspect it is probably because the incoming rows when I joined with static
frame can lead to variable degree of skewness over time and if so it is
probably better to employ different join strategies at run time. But if you
know your Dataset I believe you can just do broadcast join for your