Ok, so I don't think the workers on the data nodes will be able to see my
output directory on the edge node. I don't think stdout will work either,
so I'll write to HDFS via rdd.saveAsTextFile(...)

On Wed, Sep 10, 2014 at 3:51 PM, Daniil Osipov <[email protected]>
wrote:

> Try providing full path to the file you want to write, and make sure the
> directory exists and is writable by the Spark process.
>
> On Wed, Sep 10, 2014 at 3:46 PM, Arun Luthra <[email protected]>
> wrote:
>
>> I have a spark program that worked in local mode, but throws an error in
>> yarn-client mode on a cluster. On the edge node in my home directory, I
>> have an output directory (called transout) which is ready to receive files.
>> The spark job I'm running is supposed to write a few hundred files into
>> that directory, once for each iteration of a foreach function. This works
>> in local mode, and my only guess as to why this would fail in yarn-client
>> mode is that the RDD is distributed across many nodes and the program is
>> trying to use the PrintWriter on the datanodes, where the output directory
>> doesn't exist. Is this what's happening? Any proposed solution?
>>
>> abbreviation of the code:
>>
>> import java.io.PrintWriter
>> ...
>> rdd.foreach {
>>   val outFile = new PrintWriter("transoutput/output.%s".format(id))
>>   outFile.println("test")
>>   outFile.close()
>> }
>>
>> Error:
>>
>> 14/09/10 16:57:09 WARN TaskSetManager: Lost TID 1826 (task 0.0:26)
>> 14/09/10 16:57:09 WARN TaskSetManager: Loss was due to
>> java.io.FileNotFoundException
>> java.io.FileNotFoundException: transoutput/input.598718 (No such file or
>> directory)
>> at java.io.FileOutputStream.open(Native Method)
>> at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
>> at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
>> at java.io.PrintWriter.<init>(PrintWriter.java:146)
>> at
>> com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:98)
>> at
>> com.att.bdcoe.cip.ooh.TransformationLayer$$anonfun$main$3.apply(TransformLayer.scala:95)
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703)
>> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:703)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>>
>
>

Reply via email to