Re: two calls of saveAsTextFile() have different results on the same RDD

Cheng Lian Wed, 23 Apr 2014 00:01:36 -0700

Without caching, an RDD will be evaluated multiple times if referenced
multiple times by other RDDs. A silly example:

val text = sc.textFile("input.log")val r1 = text.filter(_ startsWith
"ERROR")val r2 = text.map(_ split " ")val r3 = (r1 ++ r2).collect()

Here the input file will be scanned twice unless you call .cache() on text.
So if your computation involves nondeterminism (e.g. random number), you
may get different results.

On Tue, Apr 22, 2014 at 11:30 AM, randylu <randyl...@gmail.com> wrote:

> it's ok when i call doc_topic_dist.cache() firstly.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: two calls of saveAsTextFile() have different results on the same RDD

Reply via email to