Re: Purpose of broadcast timeout

2019-01-30 Thread Ryan Blue
At Netflix, we disable the broadcast timeout in our defaults. I found that it never helped catch problems. With lazy evaluation, I think it is reasonable for a table that should be broadcast to take a long time to build. Just because a join uses a subset or aggregation of a large table or requires

Purpose of broadcast timeout

2019-01-30 Thread Justin Uang
Hi all, We have noticed a lot of broadcast timeouts on our pipelines, and from some inspection, it seems that they happen when I have two threads trying to save two different DataFrames. We use the FIFO scheduler, so if I launch a job that needs all the executors, the second DataFrame's collect on