Try providing full path to the file you want to write, and make sure the
directory exists and is writable by the Spark process.

On Wed, Sep 10, 2014 at 3:46 PM, Arun Luthra <[email protected]> wrote:

> I have a spark program that worked in local mode, but throws an error in
> yarn-client mode on a cluster. On the edge node in my home directory, I
> have an output directory (called transout) which is ready to receive files.
> The spark job I'm running is supposed to write a few hundred files into
> that directory, once for each iteration of a foreach function. This works
> in local mode, and my only guess as to why this would fail in yarn-client
> mode is that the RDD is distributed across many nodes and the program is
> trying to use the PrintWriter on the datanodes, where the output directory
> doesn't exist. Is this what's happening? Any proposed solution?
>
> abbreviation of the code:
>
> import java.io.PrintWriter
> ...
> rdd.foreach {
>   val outFile = new PrintWriter("transoutput/output.%s".format(id))
>   outFile.println("test")
>   outFile.close()
> }
>
> Error:
>
> 14/09/10 16:57:09 WARN TaskSetManager: Lost TID 1826 (task 0.0:26)
> 14/09/10 16:57:09 WARN TaskSetManager: Loss was due to
> java.io.FileNotFoundException
> java.io.FileNotFoundException: transoutput/input.598718 (No such file or
> directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
> at java.io.PrintWriter.<init>(PrintWriter.java:146)
> at
> com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:98)
> at
> com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:95)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>

Reply via email to