Sorry if this is a dumb question but why not several calls to map-partitions sequentially. Are you looking to avoid function serialization or is your function damaging partitions?
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Jun 13, 2014 at 1:30 AM, zhen <z...@latrobe.edu.au> wrote: > I want to take multiple passes through my data in mapPartitions. However, > the > iterator only allows you to take one pass through the data. If I > transformed > the iterator into an array using iter.toArray, it is too slow, since it > copies all the data into a new scala array. Also it takes twice the memory. > Which is also bad in terms of more GC. > > Is there a faster/better way of taking multiple passes without copying all > the data? > > Thank you, > > Zhen > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/multiple-passes-in-mapPartitions-tp7555.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >