I observe that.
If commit Job done on driver and commit task done on executor.
With speculation enable,it may cause data loss.
Since commit Job will call listStatus and commit Task will delete output
file if already exist and rename to final output.
When listStatus called after delete and before re
No, these pools are not created per job but per parfor worker and
thus, used to execute many jobs. For all scripts with a single
top-level parfor this is equivalent to static initialization. However,
yes we create these pools dynamically on demand to avoid unnecessary
initialization and handle scen
well, the point was "in a programmatic way without the need for
additional configuration files which is a hassle for a library" -
anyway, I appreciate your comments.
Regards,
Matthias
On Sat, Apr 7, 2018 at 3:43 PM, Mark Hamstra wrote:
>> Providing a way to set the mode of the default scheduler
>
> Providing a way to set the mode of the default scheduler would be awesome.
That's trivial: Just use the pool configuration XML file and define a pool
named "default" with the characteristics that you want (including
schedulingMode FAIR).
You only get the default construction of the pool name
Sorry, but I'm still not understanding this use case. Are you somehow
creating additional scheduling pools dynamically as Jobs execute? If so,
that is a very unusual thing to do. Scheduling pools are intended to be
statically configured -- initialized, living and dying with the
Application.
On Sat
Thanks for the clarification Imran - that helped. I was mistakenly
assuming that these pools are removed via weak references, as the
ContextCleaner does for RDDs, broadcasts, and accumulators, etc. For
the time being, we'll just work around it, but I'll file a
nice-to-have improvement JIRA. Also, y