Q: About scenarios where driver execution flow may block...

didata Sat, 06 Sep 2014 11:46:02 -0700

Hello friends:


I have a theory question about call blocking in a Spark driver.


Consider this (admittedly contrived =:)) snippet to illustrate this question...

x = rdd01.reduceByKey()  # or maybe some other 'shuffle-requiring action'.

b = sc.broadcast(x. take(20)) # Or any statement that requires the previousstatement to complete, cluster-wide.

y = rdd02.someAction(f(b))

Would the first or second statement above block because the second (orthird) statement needs to wait for the previous one to complete, cluster-wide?

Maybe this isn't the best example (typed on a phone), but generally I'mtrying to understand the scenario(s) where a rdd call in the driver mayblock because the graph indicates that the next statement is dependent onthe completion of the current one, cluster-wide (noy just lazy evaluated).


Thank you. :)


Sincerely yours,
Team Dimension Data

Q: About scenarios where driver execution flow may block...

Reply via email to