I want to take multiple passes through my data in mapPartitions. However, the iterator only allows you to take one pass through the data. If I transformed the iterator into an array using iter.toArray, it is too slow, since it copies all the data into a new scala array. Also it takes twice the memory. Which is also bad in terms of more GC.
Is there a faster/better way of taking multiple passes without copying all the data? Thank you, Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/multiple-passes-in-mapPartitions-tp7555.html Sent from the Apache Spark User List mailing list archive at Nabble.com.