Re: PrintWriter error in foreach

Daniil Osipov Wed, 10 Sep 2014 15:53:06 -0700

Try providing full path to the file you want to write, and make sure the
directory exists and is writable by the Spark process.


On Wed, Sep 10, 2014 at 3:46 PM, Arun Luthra <[email protected]> wrote:

> I have a spark program that worked in local mode, but throws an error in
> yarn-client mode on a cluster. On the edge node in my home directory, I
> have an output directory (called transout) which is ready to receive files.
> The spark job I'm running is supposed to write a few hundred files into
> that directory, once for each iteration of a foreach function. This works
> in local mode, and my only guess as to why this would fail in yarn-client
> mode is that the RDD is distributed across many nodes and the program is
> trying to use the PrintWriter on the datanodes, where the output directory
> doesn't exist. Is this what's happening? Any proposed solution?
>
> abbreviation of the code:
>
> import java.io.PrintWriter
> ...
> rdd.foreach {
>   val outFile = new PrintWriter("transoutput/output.%s".format(id))
>   outFile.println("test")
>   outFile.close()
> }
>
> Error:
>
> 14/09/10 16:57:09 WARN TaskSetManager: Lost TID 1826 (task 0.0:26)
> 14/09/10 16:57:09 WARN TaskSetManager: Loss was due to
> java.io.FileNotFoundException
> java.io.FileNotFoundException: transoutput/input.598718 (No such file or
> directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
> at java.io.PrintWriter.<init>(PrintWriter.java:146)
> at
> com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:98)
> at
> com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:95)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>

Re: PrintWriter error in foreach

Reply via email to