Can you dig a bit more in the worker logs? Also make sure that spark has permission to write to /opt/ on that machine as its one machine always throwing up.
Thanks Best Regards On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma <sharmagaura...@gmail.com> wrote: > Hi All, > > I am facing this issue in my production environment. > > My worker dies by throwing this exception. > But i see the space is available on all the partitions on my disk > I did NOT see any abrupt increase in DIsk IO, which might have choked the > executor to write on to the stderr file. > > But still my worker dies, this is not happening on all my workers, it's > one machine that is performing this way. > Could you please help me debug if it is happening because i am doing > something wrong, or some issue from hardware/OS perspective, that i can > debug and fix. > > > 15/07/11 18:05:45 ERROR Worker: RECEIVED SIGNAL 1: SIGHUP > 15/07/11 18:05:45 INFO ExecutorRunner: Killing process! > 15/07/11 18:05:45 ERROR FileAppender: Error writing stream to file > /opt/spark-1.4.0-bin-hadoop2.6/work/app-20150710162005-0001/16517/stderr > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at java.io.FilterInputStream.read(FilterInputStream.java:107) > at > org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) > at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) > at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) > at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) > at > org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) > 15/07/11 18:05:46 INFO Utils: Shutdown hook called > 15/07/11 18:05:46 INFO Utils: Deleting directory > /tmp/spark-f269acd9-3ab0-4b3c-843c-bcf2e8c2669f > 15/07/11 18:05:46 INFO Worker: Executor app-20150710162005-0001/16517 > finished with state EXITED message Command exited with code 129 exitStatus > 129 > > > > >