subject:"SparkContext \- parameter for RDD, but not serializable, why\?"

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Wenchen Fan

uot;*)) > partitionData.iterator > } > > } > > > > > > *From: *Wenchen Fan > *Date: *Wednesday, February 28, 2018 at 12:25 PM > *To: *"Thakrar, Jayesh" > *Cc: *"dev@spark.apache.org" > *Subject: *Re: SparkContext - parameter for RDD,

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Thakrar, Jayesh

ows = myDataSourcePartition.rowCount val partitionData = 1 to rows map(r => Row(s"Partition: ${partitionId}, row ${r} of ${rows}")) partitionData.iterator } } From: Wenchen Fan Date: Wednesday, February 28, 2018 at 12:25 PM To: "Thakrar, Jayesh" Cc: "dev@

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Wenchen Fan

My understanding: RDD is also a driver side stuff like SparkContext, works like a handler to your distributed data on the cluster. However, `RDD.compute` (defines how to produce data for each partition) needs to be executed on the remote nodes. It's more convenient to make RDD serializable, and t

SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Thakrar, Jayesh

Hi All, I was just toying with creating a very rudimentary RDD datasource to understand the inner workings of RDDs. It seems that one of the constructors for RDD has a parameter of type SparkContext, but it (apparently) exists on the driver only and is not serializable. Consequently, any atte