Re: distinct on huge dataset

Andrew Ash Sat, 22 Mar 2014 22:40:26 -0700

FWIW I've seen correctness errors with spark.shuffle.spill on 0.9.0 and
have it disabled now. The specific error behavior was that a join would
consistently return one count of rows with spill enabled and another count
with it disabled.


Sent from my mobile phone
On Mar 22, 2014 1:52 PM, "Kane" <kane.ist...@gmail.com> wrote:

> But i was wrong - map also fails on big file and setting
> spark.shuffle.spill
> doesn't help. Map fails with the same error.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/distinct-on-huge-dataset-tp3025p3039.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: distinct on huge dataset

Reply via email to