On Thu, Jul 21, 2016 at 2:56 AM, C. Josephson <cjos...@uhana.io> wrote:

> I just started looking at the DAG for a Spark Streaming job, and had a
> couple of questions about it (image inline).
>
> 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]?
>

Every RDD has its identifier (as id attribute) within a SparkContext (which
is the broadest scope an RDD can belong to). In this case, it means you've
already created 806 RDDs (counting from 0).


> 2.) What code is "RDD at PythonRDD.scala:43" referring to? Is there any
> way to tie this back to lines of code we've written in pyspark?
>

It's called a CallSite that shows where the line comes from. You can see
the code yourself given the python file and the line number.

Jacek

Reply via email to