Re: java.io.IOException: Failed to save output of task

Grega Kešpret Thu, 22 May 2014 10:10:07 -0700

I have since resolved the issue. The problem was that multiple rdds were
trying to write to the same s3 bucket.


Grega
--
[image: Inline image 1]
*Grega Kešpret*
Analytics engineer

Celtra — Rich Media Mobile Advertising
celtra.com <http://www.celtra.com/> |
@celtramobile<http://www.twitter.com/celtramobile>


On Thu, May 22, 2014 at 8:18 AM, Grega Kešpret <gr...@celtra.com> wrote:

> Hello,
>
> my last reduce task in the job always fails with "java.io.IOException:
> Failed to save output of task" when using saveAsTextFile with s3 endpoint
> (all others are successful). Has anyone had similar problems?
>
> https://gist.github.com/gregakespret/813b540faca678413ad4
>
>
> -------------
>
> 14/05/21 21:44:45 ERROR SparkHadoopWriter: Error committing the output of
> task: attempt_201405212144_0000_m_000000_3432
> java.io.IOException: Failed to save output of task:
> attempt_201405212144_0000_m_000000_3432
>         at
> org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160)
>         at
> org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
>         at
> org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
>         at
> org.apache.hadoop.mapred.SparkHadoopWriter.commit(SparkHadoopWriter.scala:110)
>         at 
> org.apache.spark.rdd.PairRDDFunctions.org<http://org.apache.spark.rdd.pairrddfunctions.org/>
> $apache$spark$rdd$PairRDDFunctions$$writeToFile$1(PairRDDFunctions.scala:731)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:734)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:734)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>         at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>         at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
>         at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
> Grega
> --
> [image: Inline image 1]
> *Grega Kešpret*
> Analytics engineer
>
> Celtra — Rich Media Mobile Advertising
> celtra.com <http://www.celtra.com/> | 
> @celtramobile<http://www.twitter.com/celtramobile>
>

Re: java.io.IOException: Failed to save output of task

Reply via email to