Hi Patrick,
What if all the data has to be keep in cache all time. If applying union
result in new RDD then caching this would result into keeping older as well
as this into memory hence duplicating data.
Below is what i understood from your comment.
sqlContext.cacheTable(existingRDD)// caches t
You can use MapPartitions to achieve this.
/split each partition into 10 equal parts with each part having number as
its id
val splittedRDD = self.mapPartitions((itr)=> {
Iterate over this iterator and breaks this iterator into 10 parts.
val iterators = Array[ArrayBuffer[T]](10)
var i =0
for(tuple