It can be set in an individual application.
Consolidation had some issues on ext3 as mentioned there, though we might
enable it by default in the future because other optimizations now made it
perform on par with the non-consolidation version. It also had some bugs in
0.9.0 so I’d suggest at le
Thanks, I missed that.
One thing that's still unclear to me, even looking at that, is - does this
parameter have to be set when starting up the cluster, on each of the
workers, or can it be set by an individual client job?
On Fri, May 23, 2014 at 10:13 AM, Han JU wrote:
> Hi Nathan,
>
> There'
Hi Nathan,
There's some explanation in the spark configuration section:
```
If set to "true", consolidates intermediate files created during a shuffle.
Creating fewer files can improve filesystem performance for shuffles with
large numbers of reduce tasks. It is recommended to set this to "true"
In trying to sort some largish datasets, we came across the
spark.shuffle.consolidateFiles property, and I found in the source code
that it is set, by default, to false, with a note to default it to true
when the feature is stable.
Does anyone know what is unstable about this? If we set it true, w