Can you show the stack trace ? The log message came from DiskBlockObjectWriter#revertPartialWritesAndClose(). Unfortunately, the method doesn't throw exception, making it a bit hard for caller to know of the disk full condition.
On Thu, Mar 31, 2016 at 11:32 AM, Abhishek Anand <abhis.anan...@gmail.com> wrote: > > Hi, > > Why is it so that when my disk space is full on one of the workers then > the executor on that worker becomes unresponsive and the jobs on that > worker fails with the exception > > > 16/03/29 10:49:00 ERROR DiskBlockObjectWriter: Uncaught exception while > reverting partial writes to file > /data/spark-e2fc248f-a212-4a99-9d6c-4e52d6a69070/executor-37679a6c-cb96-451e-a284-64d6b4fe9910/blockmgr-f8ca72f4-f329-468b-8e65-ef97f8fb285c/38/temp_shuffle_8f266d70-3fc6-41e5-bbaa-c413a7b08ea4 > java.io.IOException: No space left on device > > > This is leading to my job getting stuck. > > As a workaround I have to kill the executor, clear the space on disk and > new executor relaunched by the worker and the failed stages are recomputed. > > > How can I get rid of this problem i.e why my job get stuck on disk full > issue on one of the workers ? > > > Cheers !!! > Abhi > >