Hi, I am debugging a situation where SortShuffleWriter sometimes fail to create a file, with the following stack trace:
16/02/23 11:48:46 ERROR Executor: Exception in task 13.0 in stage 47827.0 (TID 1367089) java.io.FileNotFoundException: /tmp/spark-9dd8dca9-6803-4c6c-bb6a-0e9c0111837c/executor-129dfdb8-9422-4668-989e-e789703526ad/blockmgr-dda6e340-7859-468f-b493-04e4162d341a/00/temp_shuffle_69fe1673-9ff2-462b-92b8-683d04669aad (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I checked the linux file system (ext4) and saw the /00/ subdir is missing. I went through the heap dump of the CoarseGrainedExecutorBackend jvm proc and found that DiskBlockManager's subDirs list had more non-null 2-hex subdirs than present on the file system! As a test I created all 64 2-hex subdirs by hand and then the problem went away. So had anybody else seen this problem? Looking at the relevant logic in DiskBlockManager and it hasn't changed much since the fix to https://issues.apache.org/jira/browse/SPARK-6468 My configuration: spark-1.5.1, hadoop-2.6.0, standalone, oracle jdk8u60 Thanks, Zee --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org