Iteration question

2014-07-11 Thread Nathan Kronenfeld
Hi, folks. We're having a problem with iteration that I don't understand. We have the following test code: org.apache.log4j.Logger.getLogger("org").setLevel(org.apache.log4j.Level.WARN) org.apache.log4j.Logger.getLogger("akka").setLevel(org.apache.log4j.Level.WARN) def test (caching: Boolean, p

Re: Iteration question

2014-07-15 Thread Matei Zaharia
Hi Nathan, I think there are two possible reasons for this. One is that even though you are caching RDDs, their lineage chain gets longer and longer, and thus serializing each RDD takes more time. You can cut off the chain by using RDD.checkpoint() periodically, say every 5-10 iterations. The s