At Netflix, we disable the broadcast timeout in our defaults.
I found that it never helped catch problems. With lazy evaluation, I think
it is reasonable for a table that should be broadcast to take a long time
to build. Just because a join uses a subset or aggregation of a large table
or requires
Hi all,
We have noticed a lot of broadcast timeouts on our pipelines, and from some
inspection, it seems that they happen when I have two threads trying to
save two different DataFrames. We use the FIFO scheduler, so if I launch a
job that needs all the executors, the second DataFrame's collect on