uot;*))
> partitionData.iterator
> }
>
> }
>
>
>
>
>
> *From: *Wenchen Fan
> *Date: *Wednesday, February 28, 2018 at 12:25 PM
> *To: *"Thakrar, Jayesh"
> *Cc: *"dev@spark.apache.org"
> *Subject: *Re: SparkContext - parameter for RDD,
ows = myDataSourcePartition.rowCount
val partitionData = 1 to rows map(r => Row(s"Partition: ${partitionId}, row
${r} of ${rows}"))
partitionData.iterator
}
}
From: Wenchen Fan
Date: Wednesday, February 28, 2018 at 12:25 PM
To: "Thakrar, Jayesh"
Cc: "dev@
My understanding:
RDD is also a driver side stuff like SparkContext, works like a handler to
your distributed data on the cluster.
However, `RDD.compute` (defines how to produce data for each partition)
needs to be executed on the remote nodes. It's more convenient to make RDD
serializable, and t
Hi All,
I was just toying with creating a very rudimentary RDD datasource to understand
the inner workings of RDDs.
It seems that one of the constructors for RDD has a parameter of type
SparkContext, but it (apparently) exists on the driver only and is not
serializable.
Consequently, any atte