Without caching, an RDD will be evaluated multiple times if referenced
multiple times by other RDDs. A silly example:

val text = sc.textFile("input.log")val r1 = text.filter(_ startsWith
"ERROR")val r2 = text.map(_ split " ")val r3 = (r1 ++ r2).collect()

Here the input file will be scanned twice unless you call .cache() on text.
So if your computation involves nondeterminism (e.g. random number), you
may get different results.


On Tue, Apr 22, 2014 at 11:30 AM, randylu <randyl...@gmail.com> wrote:

> it's ok when i call doc_topic_dist.cache() firstly.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to