The broadcasted table can't seem to be resued across multiple actions.
e.g.
val small_df_bc = broadcast(small_df)
big_df1.join(small_df_bc, Seq("id")).write.parquet("/test1")
big_df2.join(small_df_bc, Seq("id")).write.parquet("/test2")

we can tell the small df has been distributed twice in the spark web UI.

so how can we make it happen only once?

thanks a million.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to