On Thu, Jul 21, 2016 at 2:56 AM, C. Josephson <cjos...@uhana.io> wrote:
> I just started looking at the DAG for a Spark Streaming job, and had a > couple of questions about it (image inline). > > 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]? > Every RDD has its identifier (as id attribute) within a SparkContext (which is the broadest scope an RDD can belong to). In this case, it means you've already created 806 RDDs (counting from 0). > 2.) What code is "RDD at PythonRDD.scala:43" referring to? Is there any > way to tie this back to lines of code we've written in pyspark? > It's called a CallSite that shows where the line comes from. You can see the code yourself given the python file and the line number. Jacek