Hi,

When an execution plan is printed it lists the tree of operations that will
be completed when the job is run. The tasks have somewhat cryptic names of
the sort: BroadcastHashJoin, Project, Filter, etc. These do not appear to
map directly to functions that are performed on an RDD.

1) Is there a place in which each of these steps are documented?
2) Is there documentation, outside of Spark's source code, in which the map
between operations on Spark dataframes or RDDs and the resulting physical
execution plan is described? At least in a way that would allow for more
accurately understanding physical execution steps and predicting the steps
that would result from particular actions.

Regards,

Reply via email to