Hi Nathan,
I think there are two possible reasons for this. One is that even though you
are caching RDDs, their lineage chain gets longer and longer, and thus
serializing each RDD takes more time. You can cut off the chain by using
RDD.checkpoint() periodically, say every 5-10 iterations. The s
Hi, folks.
We're having a problem with iteration that I don't understand.
We have the following test code:
org.apache.log4j.Logger.getLogger("org").setLevel(org.apache.log4j.Level.WARN)
org.apache.log4j.Logger.getLogger("akka").setLevel(org.apache.log4j.Level.WARN)
def test (caching: Boolean, p