Re: write.df is failing on Spark Cluster

Kevin Mellott Tue, 20 Sep 2016 07:25:04 -0700

Have you checked to see if any files already exist at /nfspartition/sankar/
banking_l1_v2.csv? If so, you will need to delete them before attempting to
save your DataFrame to that location. Alternatively, you may be able to
specify the "mode" setting of the df.write operation to "overwrite",
depending on the version of Spark you are running.


*ERROR (from log)*
16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/
0/task_201609170802_0013_m_000000/.part-r-00000-46a7f178-
2490-444e-9110-510978eaaecb.csv.crc]:
it still exists.
16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/
0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-
2490-444e-9110-510978eaaecb.csv]:
it still exists.

*df.write Documentation*
http://spark.apache.org/docs/latest/api/R/write.df.html

Thanks,
Kevin

On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally <
sankar.mittapa...@creditvidya.com> wrote:

>  We have setup a spark cluster which is on NFS shared storage, there is no
> permission issues with NFS storage, all the users are able to write to NFS
> storage. When I fired write.df command in SparkR, I am getting below. Can
> some one please help me to fix this issue.
>
>
> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus
> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_
> temporary/0/task_201609170802_0013_m_000000/part-r-00000-
> 46a7f178-2490-444e-9110-510978eaaecb.csv;
> isDirectory=false; length=436486316; replication=1; blocksize=33554432;
> modification_time=1474099400000; access_time=0; owner=; group=;
> permission=rw-rw-rw-; isSymlink=false}
> to
> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-
> 00000-46a7f178-2490-444e-9110-510978eaaecb.csv
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(
> FileOutputCommitter.java:371)
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(
> FileOutputCommitter.java:384)
> at
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(
> FileOutputCommitter.java:326)
> at
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(
> WriterContainer.scala:222)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationComm
> and.scala:144)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> SQLExecution.scala:57)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and.run(InsertIntoHadoopFsRelationCommand.scala:115)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.
> sideEffectResult$lzycompute(commands.scala:60)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.
> sideEffectResult(commands.scala:58)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(
> commands.scala:74)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$1.apply(SparkPlan.scala:115)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$1.apply(SparkPlan.scala:115)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(
> SparkPlan.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:86)
> at
> org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:86)
> at
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.
> scala:487)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(
> RBackendHandler.scala:141)
> at
> org.apache.spark.api.r.RBackendHandler.channelRead0(
> RBackendHandler.scala:86)
> at
> org.apache.spark.api.r.RBackendHandler.channelRead0(
> RBackendHandler.scala:38)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(
> SimpleChannelInboundHandler.java:105)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
> AbstractChannelHandlerContext.java:308)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:294)
> at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(
> MessageToMessageDecoder.java:103)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
> AbstractChannelHandlerContext.java:308)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:294)
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> ByteToMessageDecoder.java:244)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
> AbstractChannelHandlerContext.java:308)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:294)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:846)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
> AbstractNioByteChannel.java:131)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:111)
> at
> io.netty.util.concurrent.DefaultThreadFactory$
> DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> at java.lang.Thread.run(Thread.java:745)
> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/
> 0/task_201609170802_0013_m_000000/.part-r-00000-46a7f178-
> 2490-444e-9110-510978eaaecb.csv.crc]:
> it still exists.
> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/
> 0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-
> 2490-444e-9110-510978eaaecb.csv]:
> it still exists.
> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job job_201609170803_0000
> aborted.
> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
> org.apache.spark.SparkException: Job aborted.
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationComm
> and.scala:149)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> SQLExecution.scala:57)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationComm
> and.run(InsertIntoHadoopFsRelationCommand.scala:115)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.
> sideEffectResult$lzycompute(commands.scala:60)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.
> sideEffectResult(commands.scala:58)
> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-tp27761.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: write.df is failing on Spark Cluster

Reply via email to