Hello friends:

I have a theory question about call blocking in a Spark driver.


Consider this (admittedly contrived =:)) snippet to illustrate this question...


x = rdd01.reduceByKey()  # or maybe some other 'shuffle-requiring action'.

b = sc.broadcast(x. take(20)) # Or any statement that requires the previous statement to complete, cluster-wide.

y = rdd02.someAction(f(b))


Would the first or second statement above block because the second (or third) statement needs to wait for the previous one to complete, cluster-wide?


Maybe this isn't the best example (typed on a phone), but generally I'm trying to understand the scenario(s) where a rdd call in the driver may block because the graph indicates that the next statement is dependent on the completion of the current one, cluster-wide (noy just lazy evaluated).

Thank you. :)


Sincerely yours,
Team Dimension Data

Reply via email to