Hi all, I'm running a job that seems to continually fail with the following exception:
java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) ... org.apache.spark.executor.Executor.org <http://org.apache.spark.executor.executor.org/> $apache$spark$executor$Executor$$updateDependencies(Executor.scala:330) This is running spark-assembly-1.0.0-hadoop2.3.0 through yarn. The only additional error I see is 14/06/20 10:44:15 WARN NewHadoopRDD: Exception in RecordReader.close() net.sf.samtools.util.RuntimeIOException: java.io.IOException: Filesystem closed I had thought this issue of the file system closed was resolved in https://issues.apache.org/jira/browse/SPARK-1676. I've also attempted to run under a single core to avoid this issue (which seems to help sometimes as this failure is intermittent) I saw a previous mail thread: http://apache-spark-user-list.1001560.n3.nabble.com/Filesystem-closed-while-running-spark-job-td4596.html a suggestion to disable caching? Anyone seen this before or know a resolution. As I mentioned this is intermittent as sometimes the job runs to completion, or sometimes fails in this way. Thanks, Arun