Re: Web UI doesn't show some stages

Patrick Wendell Wed, 20 Aug 2014 14:29:29 -0700

The reason is that some operators get pipelined into a single stage.
rdd.map(XX).filter(YY) - this executes in a single stage since there is no
data movement needed in between these operations.


If you call toDeubgString on the final RDD it will give you some
information about the exact lineage. In Spark 1.1 this will return
information about stage boudnaries as well.


On Wed, Aug 20, 2014 at 4:22 AM, Grzegorz Białek <
grzegorz.bia...@codilime.com> wrote:

> Hi,
>
> I am wondering why in web UI some stages (like join, filter) are not
> visible. For example this code:
>
> val simple = sc.parallelize(Array.range(0,100))
> val simple2 = sc.parallelize(Array.range(0,100))
>
>   val toJoin = simple.map(x => (x, x.toString + x.toString))
>   val rdd = simple2
>     .map(x => (scala.util.Random.nextInt(100), x))
>     .join(toJoin)
>     .map { case (r, (x, s)) => (r, x)}
>     .reduceByKey(_ + _)
>     .sortByKey()
>     .cache()
>   rdd.saveAsTextFile("output/1")
>
>   val rdd2 = toJoin
>     .groupBy{ case (x, _) => x}
>     .filter{ case (x, _) => x < 10}
>   rdd2.saveAsTextFile("output/2")
>
>   println(rdd2.join(toJoin).count())
>
> in UI doesn't show join and filter stages and moreover it shows sortByKey
> and reduceByKey twice.
> Could anyone explain how it works?
>
> Thanks,
> Grzegorz
>

Re: Web UI doesn't show some stages

Reply via email to