Hi,
I am wondering why in web UI some stages (like join, filter) are not
visible. For example this code:
val simple = sc.parallelize(Array.range(0,100))
val simple2 = sc.parallelize(Array.range(0,100))
val toJoin = simple.map(x = (x, x.toString + x.toString))
val rdd = simple2
.map(x =
The reason is that some operators get pipelined into a single stage.
rdd.map(XX).filter(YY) - this executes in a single stage since there is no
data movement needed in between these operations.
If you call toDeubgString on the final RDD it will give you some
information about the exact lineage.
Try to answer your another question.
One sortByKey is triggered by rangePartition which does sample to calculate the
range boundaries, which again triggers the first reduceByKey.
The second sortByKey is doing the real work to sort based on the partition
calculated, which again trigger the