Try providing full path to the file you want to write, and make sure the directory exists and is writable by the Spark process.
On Wed, Sep 10, 2014 at 3:46 PM, Arun Luthra <[email protected]> wrote: > I have a spark program that worked in local mode, but throws an error in > yarn-client mode on a cluster. On the edge node in my home directory, I > have an output directory (called transout) which is ready to receive files. > The spark job I'm running is supposed to write a few hundred files into > that directory, once for each iteration of a foreach function. This works > in local mode, and my only guess as to why this would fail in yarn-client > mode is that the RDD is distributed across many nodes and the program is > trying to use the PrintWriter on the datanodes, where the output directory > doesn't exist. Is this what's happening? Any proposed solution? > > abbreviation of the code: > > import java.io.PrintWriter > ... > rdd.foreach { > val outFile = new PrintWriter("transoutput/output.%s".format(id)) > outFile.println("test") > outFile.close() > } > > Error: > > 14/09/10 16:57:09 WARN TaskSetManager: Lost TID 1826 (task 0.0:26) > 14/09/10 16:57:09 WARN TaskSetManager: Loss was due to > java.io.FileNotFoundException > java.io.FileNotFoundException: transoutput/input.598718 (No such file or > directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > at java.io.PrintWriter.<init>(PrintWriter.java:146) > at > com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:98) > at > com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:95) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703) > at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) >
