Re: write.df is failing on Spark Cluster

Sankar Mittapally Tue, 20 Sep 2016 10:20:11 -0700

I used that one also

On Sep 20, 2016 10:44 PM, "Kevin Mellott" <kevin.r.mell...@gmail.com> wrote:


> Instead of *mode="append"*, try *mode="overwrite"*
>
> On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally <sankar.mittapally@
> creditvidya.com> wrote:
>
>> Please find the code below.
>>
>> sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")
>>
>> I tried these two commands.
>> write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv",
>> header="true")
>>
>> saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true")
>>
>>
>>
>> On Tue, Sep 20, 2016 at 9:40 PM, Kevin Mellott <kevin.r.mell...@gmail.com
>> > wrote:
>>
>>> Can you please post the line of code that is doing the df.write command?
>>>
>>> On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally <
>>> sankar.mittapa...@creditvidya.com> wrote:
>>>
>>>> Hey Kevin,
>>>>
>>>> It is a empty directory, It is able to write part files to the
>>>> directory but while merging those part files we are getting above error.
>>>>
>>>> Regards
>>>>
>>>>
>>>> On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott <
>>>> kevin.r.mell...@gmail.com> wrote:
>>>>
>>>>> Have you checked to see if any files already exist at
>>>>> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to
>>>>> delete them before attempting to save your DataFrame to that location.
>>>>> Alternatively, you may be able to specify the "mode" setting of the
>>>>> df.write operation to "overwrite", depending on the version of Spark you
>>>>> are running.
>>>>>
>>>>> *ERROR (from log)*
>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e
>>>>> -9110-510978eaaecb.csv.crc]:
>>>>> it still exists.
>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-
>>>>> 9110-510978eaaecb.csv]:
>>>>> it still exists.
>>>>>
>>>>> *df.write Documentation*
>>>>> http://spark.apache.org/docs/latest/api/R/write.df.html
>>>>>
>>>>> Thanks,
>>>>> Kevin
>>>>>
>>>>> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally <
>>>>> sankar.mittapa...@creditvidya.com> wrote:
>>>>>
>>>>>>  We have setup a spark cluster which is on NFS shared storage, there
>>>>>> is no
>>>>>> permission issues with NFS storage, all the users are able to write
>>>>>> to NFS
>>>>>> storage. When I fired write.df command in SparkR, I am getting below.
>>>>>> Can
>>>>>> some one please help me to fix this issue.
>>>>>>
>>>>>>
>>>>>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting
>>>>>> job.
>>>>>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus
>>>>>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary
>>>>>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-249
>>>>>> 0-444e-9110-510978eaaecb.csv;
>>>>>> isDirectory=false; length=436486316; replication=1;
>>>>>> blocksize=33554432;
>>>>>> modification_time=1474099400000; access_time=0; owner=; group=;
>>>>>> permission=rw-rw-rw-; isSymlink=false}
>>>>>> to
>>>>>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a
>>>>>> 7f178-2490-444e-9110-510978eaaecb.csv
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>>>>> ergePaths(FileOutputCommitter.java:371)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>>>>> ergePaths(FileOutputCommitter.java:384)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.c
>>>>>> ommitJob(FileOutputCommitter.java:326)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.BaseWriterContain
>>>>>> er.commitJob(WriterContainer.scala:222)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>>>>> pFsRelationCommand.scala:144)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>> tionCommand.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>> tionCommand.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>>>>> nId(SQLExecution.scala:57)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>> ideEffectResult$lzycompute(commands.scala:60)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>> ideEffectResult(commands.scala:58)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.d
>>>>>> oExecute(commands.scala:74)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>>>>> apply(SparkPlan.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>>>>> apply(SparkPlan.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue
>>>>>> ry$1.apply(SparkPlan.scala:136)
>>>>>> at
>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>>>> onScope.scala:151)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkP
>>>>>> lan.scala:133)
>>>>>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.s
>>>>>> cala:114)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompu
>>>>>> te(QueryExecution.scala:86)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExe
>>>>>> cution.scala:86)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.DataSource.write(
>>>>>> DataSource.scala:487)
>>>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc
>>>>>> ala:211)
>>>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc
>>>>>> ala:194)
>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>>>> ssorImpl.java:62)
>>>>>> at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>>>> thodAccessorImpl.java:43)
>>>>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>> at
>>>>>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac
>>>>>> kendHandler.scala:141)
>>>>>> at
>>>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>>>>> Handler.scala:86)
>>>>>> at
>>>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>>>>> Handler.scala:38)
>>>>>> at
>>>>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim
>>>>>> pleChannelInboundHandler.java:105)
>>>>>> at
>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>>>> Read(AbstractChannelHandlerContext.java:308)
>>>>>> at
>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>>>> ad(AbstractChannelHandlerContext.java:294)
>>>>>> at
>>>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>>>>>> essageToMessageDecoder.java:103)
>>>>>> at
>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>>>> Read(AbstractChannelHandlerContext.java:308)
>>>>>> at
>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>>>> ad(AbstractChannelHandlerContext.java:294)
>>>>>> at
>>>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>>>>>> ToMessageDecoder.java:244)
>>>>>> at
>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>>>> Read(AbstractChannelHandlerContext.java:308)
>>>>>> at
>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>>>> ad(AbstractChannelHandlerContext.java:294)
>>>>>> at
>>>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>>>>>> ultChannelPipeline.java:846)
>>>>>> at
>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re
>>>>>> ad(AbstractNioByteChannel.java:131)
>>>>>> at
>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>>>>>> tLoop.java:511)
>>>>>> at
>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>>>>>> ed(NioEventLoop.java:468)
>>>>>> at
>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve
>>>>>> ntLoop.java:382)
>>>>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>>>>> at
>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>>>>> gleThreadEventExecutor.java:111)
>>>>>> at
>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl
>>>>>> eDecorator.run(DefaultThreadFactory.java:137)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e
>>>>>> -9110-510978eaaecb.csv.crc]:
>>>>>> it still exists.
>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-
>>>>>> 9110-510978eaaecb.csv]:
>>>>>> it still exists.
>>>>>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job
>>>>>> job_201609170803_0000
>>>>>> aborted.
>>>>>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
>>>>>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
>>>>>> org.apache.spark.SparkException: Job aborted.
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>>>>> pFsRelationCommand.scala:149)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>> tionCommand.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>> tionCommand.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>>>>> nId(SQLExecution.scala:57)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>> ideEffectResult$lzycompute(commands.scala:60)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>> ideEffectResult(commands.scala:58)
>>>>>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context: http://apache-spark-user-list.
>>>>>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-t
>>>>>> p27761.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Sankar Mittapally
>>>> Senior Software Engineer
>>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Sankar Mittapally
>> Senior Software Engineer
>>
>
>

Re: write.df is failing on Spark Cluster

Reply via email to