Re: shuffle files not deleted after executor restarted

2016-09-02 Thread Artur Sukhenko
I believe in your case it will help, as executor's shuffle files will be managed by external service. It is described in spark docs: graceful-decommission-of-executors Artur On Fri, Sep 2, 2016 at 1:01

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread 汪洋
> 在 2016年9月2日,下午5:58,汪洋 写道: > > Yeah, using external shuffle service is a reasonable choice but I think we > will still face the same problems. We use SSDs to store shuffle files for > performance considerations. If the shuffle files are not going to be used > anymore, we want them to be dele

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread 汪洋
Yeah, using external shuffle service is a reasonable choice but I think we will still face the same problems. We use SSDs to store shuffle files for performance considerations. If the shuffle files are not going to be used anymore, we want them to be deleted instead of taking up valuable SSD spa

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread Artur Sukhenko
Hi Yang, Isn't external shuffle service better for long running applications? "It runs as a standalone application and manages shuffle output files so they are available for executors at all time" It is described here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-Extern

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread 汪洋
Thank you for you response. We are using spark-1.6.2 on standalone deploy mode with dynamic allocation disabled. I have traced the code. IMHO, it seems this cleanup is not handled by shutdown hooks directly. The shutdown hooks only send a “ExecutorStateChanged” message to the worker and if th

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread Sun Rui
Hi, Could you give more information about your Spark environment? cluster manager, spark version, using dynamic allocation or not, etc.. Generally, executors will delete temporary directories for shuffle files on exit because JVM shutdown hooks are registered. Unless they are brutally killed. Y