This is what I am getting in the executor logs 16/03/29 10:49:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /data/spark-e2fc248f-a212-4a99-9d6c-4e52d6a69070/executor-37679a6c-cb96-451e-a284-64d6b4fe9910/blockmgr-f8ca72f4-f329-468b-8e65-ef97f8fb285c/38/temp_shuffle_8f266d70-3fc6-41e5-bbaa-c413a7b08ea4 java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:315) at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:274)
It happens every time the disk is full. On Fri, Apr 1, 2016 at 2:18 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you show the stack trace ? > > The log message came from > DiskBlockObjectWriter#revertPartialWritesAndClose(). > Unfortunately, the method doesn't throw exception, making it a bit hard > for caller to know of the disk full condition. > > On Thu, Mar 31, 2016 at 11:32 AM, Abhishek Anand <abhis.anan...@gmail.com> > wrote: > >> >> Hi, >> >> Why is it so that when my disk space is full on one of the workers then >> the executor on that worker becomes unresponsive and the jobs on that >> worker fails with the exception >> >> >> 16/03/29 10:49:00 ERROR DiskBlockObjectWriter: Uncaught exception while >> reverting partial writes to file >> /data/spark-e2fc248f-a212-4a99-9d6c-4e52d6a69070/executor-37679a6c-cb96-451e-a284-64d6b4fe9910/blockmgr-f8ca72f4-f329-468b-8e65-ef97f8fb285c/38/temp_shuffle_8f266d70-3fc6-41e5-bbaa-c413a7b08ea4 >> java.io.IOException: No space left on device >> >> >> This is leading to my job getting stuck. >> >> As a workaround I have to kill the executor, clear the space on disk and >> new executor relaunched by the worker and the failed stages are recomputed. >> >> >> How can I get rid of this problem i.e why my job get stuck on disk full >> issue on one of the workers ? >> >> >> Cheers !!! >> Abhi >> >> >