Re: write.df is failing on Spark Cluster

Sankar Mittapally Tue, 20 Sep 2016 07:30:24 -0700

Hey Kevin,

It is a empty directory, It is able to write part files to the directory
but while merging those part files we are getting above error.


Regards


On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott <kevin.r.mell...@gmail.com>
wrote:

> Have you checked to see if any files already exist at
> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to delete
> them before attempting to save your DataFrame to that location.
> Alternatively, you may be able to specify the "mode" setting of the
> df.write operation to "overwrite", depending on the version of Spark you
> are running.
>
> *ERROR (from log)*
> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/
> task_201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-
> 444e-9110-510978eaaecb.csv.crc]:
> it still exists.
> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/
> task_201609170802_0013_m_000000/part-r-00000-46a7f178-2490-
> 444e-9110-510978eaaecb.csv]:
> it still exists.
>
> *df.write Documentation*
> http://spark.apache.org/docs/latest/api/R/write.df.html
>
> Thanks,
> Kevin
>
> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally <sankar.mittapally@
> creditvidya.com> wrote:
>
>>  We have setup a spark cluster which is on NFS shared storage, there is no
>> permission issues with NFS storage, all the users are able to write to NFS
>> storage. When I fired write.df command in SparkR, I am getting below. Can
>> some one please help me to fix this issue.
>>
>>
>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus
>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary
>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-
>> 2490-444e-9110-510978eaaecb.csv;
>> isDirectory=false; length=436486316; replication=1; blocksize=33554432;
>> modification_time=1474099400000; access_time=0; owner=; group=;
>> permission=rw-rw-rw-; isSymlink=false}
>> to
>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-
>> 46a7f178-2490-444e-9110-510978eaaecb.csv
>> at
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>> ergePaths(FileOutputCommitter.java:371)
>> at
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>> ergePaths(FileOutputCommitter.java:384)
>> at
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.
>> commitJob(FileOutputCommitter.java:326)
>> at
>> org.apache.spark.sql.execution.datasources.BaseWriterContain
>> er.commitJob(WriterContainer.scala:222)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>> pFsRelationCommand.scala:144)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>> tionCommand.scala:115)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>> tionCommand.scala:115)
>> at
>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(SQLExecution.scala:57)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>> ideEffectResult$lzycompute(commands.scala:60)
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>> ideEffectResult(commands.scala:58)
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> doExecute(commands.scala:74)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>> apply(SparkPlan.scala:115)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>> apply(SparkPlan.scala:115)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue
>> ry$1.apply(SparkPlan.scala:136)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:151)
>> at
>> org.apache.spark.sql.execution.SparkPlan.executeQuery(
>> SparkPlan.scala:133)
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>> at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$
>> lzycompute(QueryExecution.scala:86)
>> at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(
>> QueryExecution.scala:86)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource.write(
>> DataSource.scala:487)
>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac
>> kendHandler.scala:141)
>> at
>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>> Handler.scala:86)
>> at
>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>> Handler.scala:38)
>> at
>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim
>> pleChannelInboundHandler.java:105)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:308)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:294)
>> at
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>> essageToMessageDecoder.java:103)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:308)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:294)
>> at
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>> ToMessageDecoder.java:244)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:308)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:294)
>> at
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>> ultChannelPipeline.java:846)
>> at
>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.
>> read(AbstractNioByteChannel.java:131)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>> tLoop.java:511)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>> ed(NioEventLoop.java:468)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve
>> ntLoop.java:382)
>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(
>> SingleThreadEventExecutor.java:111)
>> at
>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl
>> eDecorator.run(DefaultThreadFactory.java:137)
>> at java.lang.Thread.run(Thread.java:745)
>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/
>> task_201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-
>> 444e-9110-510978eaaecb.csv.crc]:
>> it still exists.
>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/
>> task_201609170802_0013_m_000000/part-r-00000-46a7f178-2490-
>> 444e-9110-510978eaaecb.csv]:
>> it still exists.
>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job job_201609170803_0000
>> aborted.
>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
>> org.apache.spark.SparkException: Job aborted.
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>> pFsRelationCommand.scala:149)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>> tionCommand.scala:115)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>> tionCommand.scala:115)
>> at
>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(SQLExecution.scala:57)
>> at
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>> ideEffectResult$lzycompute(commands.scala:60)
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>> ideEffectResult(commands.scala:58)
>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-tp27761.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


-- 
Regards

Sankar Mittapally
Senior Software Engineer

Re: write.df is failing on Spark Cluster

Reply via email to