Re: Fair scheduler pool leak

2018-04-07 Thread Matthias Boehm
Thanks for the clarification Imran - that helped. I was mistakenly assuming that these pools are removed via weak references, as the ContextCleaner does for RDDs, broadcasts, and accumulators, etc. For the time being, we'll just work around it, but I'll file a nice-to-have improvement JIRA. Also, y

Re: Fair scheduler pool leak

2018-04-07 Thread Mark Hamstra
Sorry, but I'm still not understanding this use case. Are you somehow creating additional scheduling pools dynamically as Jobs execute? If so, that is a very unusual thing to do. Scheduling pools are intended to be statically configured -- initialized, living and dying with the Application. On Sat

Re: Fair scheduler pool leak

2018-04-07 Thread Mark Hamstra
> > Providing a way to set the mode of the default scheduler would be awesome. That's trivial: Just use the pool configuration XML file and define a pool named "default" with the characteristics that you want (including schedulingMode FAIR). You only get the default construction of the pool name

Re: Fair scheduler pool leak

2018-04-07 Thread Matthias Boehm
well, the point was "in a programmatic way without the need for additional configuration files which is a hassle for a library" - anyway, I appreciate your comments. Regards, Matthias On Sat, Apr 7, 2018 at 3:43 PM, Mark Hamstra wrote: >> Providing a way to set the mode of the default scheduler

Re: Fair scheduler pool leak

2018-04-07 Thread Matthias Boehm
No, these pools are not created per job but per parfor worker and thus, used to execute many jobs. For all scripts with a single top-level parfor this is equivalent to static initialization. However, yes we create these pools dynamically on demand to avoid unnecessary initialization and handle scen

Re: saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

2018-04-07 Thread 周康
I observe that. If commit Job done on driver and commit task done on executor. With speculation enable,it may cause data loss. Since commit Job will call listStatus and commit Task will delete output file if already exist and rename to final output. When listStatus called after delete and before re