In case a little more information is helpful:

the RDD is constructed using sc.textFile(fileUri) where the fileUri is to a
".gz" file (that's too big to fit on my disk).

I do an rdd.persist(StorageLevel.NONE) and it seems to have no affect.

This rdd is what I'm calling aggregate on and I expect to only use it once.
Each row in the rdd never has to be revisited. The aggregate seqOp is
modifying a "current state" and returning it so there's no need to store the
results of the seqOp on a row-by-row basis, and give the fact that there's
one partition the comboOp doesn't even need to be called (since there would
be nothing to combine across partitions).

Thanks for any help.
Jim




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/No-disk-single-pass-RDD-aggregation-tp20723p20724.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to