I think Xuefeng Wu's suggestion is likely correct. This different is more
likely explained by the compression library changing versions than sort vs
hash shuffle (which should not affect output size significantly). Others
have reported that switching to lz4 fixed their issue.

We should document this if this is the case. I wonder if we're asking
Snappy to be super-low-overhead and as a result the new version does a
better job of it (less overhead, less compression).

On Sat, Feb 14, 2015 at 9:32 AM, Peng Cheng <pc...@uow.edu.au> wrote:

> I double check the 1.2 feature list and found out that the new sort-based
> shuffle manager has nothing to do with HashPartitioner :-< Sorry for the
> misinformation.
>
> In another hand. This may explain increase in shuffle spill as a side
> effect
> of the new shuffle manager, let me revert spark.shuffle.manager to hash and
> see if it make things better (or worse, as the benchmark in
> https://issues.apache.org/jira/browse/SPARK-3280 indicates)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21657.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to