Re: write.df is failing on Spark Cluster

Divya Gehlot Tue, 20 Sep 2016 19:11:54 -0700

Spark version plz ?

On 21 September 2016 at 09:46, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:


> Yeah I can do all operations on that folder
>
> On Sep 21, 2016 12:15 AM, "Kevin Mellott" <kevin.r.mell...@gmail.com>
> wrote:
>
>> Are you able to manually delete the folder below? I'm wondering if there
>> is some sort of non-Spark factor involved (permissions, etc).
>>
>> /nfspartition/sankar/banking_l1_v2.csv
>>
>> On Tue, Sep 20, 2016 at 12:19 PM, Sankar Mittapally <
>> sankar.mittapa...@creditvidya.com> wrote:
>>
>>> I used that one also
>>>
>>> On Sep 20, 2016 10:44 PM, "Kevin Mellott" <kevin.r.mell...@gmail.com>
>>> wrote:
>>>
>>>> Instead of *mode="append"*, try *mode="overwrite"*
>>>>
>>>> On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally <
>>>> sankar.mittapa...@creditvidya.com> wrote:
>>>>
>>>>> Please find the code below.
>>>>>
>>>>> sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")
>>>>>
>>>>> I tried these two commands.
>>>>> write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv",
>>>>> header="true")
>>>>>
>>>>> saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true")
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 20, 2016 at 9:40 PM, Kevin Mellott <
>>>>> kevin.r.mell...@gmail.com> wrote:
>>>>>
>>>>>> Can you please post the line of code that is doing the df.write
>>>>>> command?
>>>>>>
>>>>>> On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally <
>>>>>> sankar.mittapa...@creditvidya.com> wrote:
>>>>>>
>>>>>>> Hey Kevin,
>>>>>>>
>>>>>>> It is a empty directory, It is able to write part files to the
>>>>>>> directory but while merging those part files we are getting above error.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott <
>>>>>>> kevin.r.mell...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Have you checked to see if any files already exist at
>>>>>>>> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to
>>>>>>>> delete them before attempting to save your DataFrame to that location.
>>>>>>>> Alternatively, you may be able to specify the "mode" setting of the
>>>>>>>> df.write operation to "overwrite", depending on the version of Spark 
>>>>>>>> you
>>>>>>>> are running.
>>>>>>>>
>>>>>>>> *ERROR (from log)*
>>>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e
>>>>>>>> -9110-510978eaaecb.csv.crc]:
>>>>>>>> it still exists.
>>>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-
>>>>>>>> 9110-510978eaaecb.csv]:
>>>>>>>> it still exists.
>>>>>>>>
>>>>>>>> *df.write Documentation*
>>>>>>>> http://spark.apache.org/docs/latest/api/R/write.df.html
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally <
>>>>>>>> sankar.mittapa...@creditvidya.com> wrote:
>>>>>>>>
>>>>>>>>>  We have setup a spark cluster which is on NFS shared storage,
>>>>>>>>> there is no
>>>>>>>>> permission issues with NFS storage, all the users are able to
>>>>>>>>> write to NFS
>>>>>>>>> storage. When I fired write.df command in SparkR, I am getting
>>>>>>>>> below. Can
>>>>>>>>> some one please help me to fix this issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand:
>>>>>>>>> Aborting job.
>>>>>>>>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus
>>>>>>>>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary
>>>>>>>>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-249
>>>>>>>>> 0-444e-9110-510978eaaecb.csv;
>>>>>>>>> isDirectory=false; length=436486316; replication=1;
>>>>>>>>> blocksize=33554432;
>>>>>>>>> modification_time=1474099400000; access_time=0; owner=; group=;
>>>>>>>>> permission=rw-rw-rw-; isSymlink=false}
>>>>>>>>> to
>>>>>>>>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a
>>>>>>>>> 7f178-2490-444e-9110-510978eaaecb.csv
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>>>>>>>> ergePaths(FileOutputCommitter.java:371)
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>>>>>>>> ergePaths(FileOutputCommitter.java:384)
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.c
>>>>>>>>> ommitJob(FileOutputCommitter.java:326)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.BaseWriterContain
>>>>>>>>> er.commitJob(WriterContainer.scala:222)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>>>>>>>> pFsRelationCommand.scala:144)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>>>>> tionCommand.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>>>>> tionCommand.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>>>>>>>> nId(SQLExecution.scala:57)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>>>>> ideEffectResult$lzycompute(commands.scala:60)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>>>>> ideEffectResult(commands.scala:58)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.d
>>>>>>>>> oExecute(commands.scala:74)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>>>>>>>> apply(SparkPlan.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>>>>>>>> apply(SparkPlan.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue
>>>>>>>>> ry$1.apply(SparkPlan.scala:136)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>>>>>>> onScope.scala:151)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkP
>>>>>>>>> lan.scala:133)
>>>>>>>>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.s
>>>>>>>>> cala:114)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompu
>>>>>>>>> te(QueryExecution.scala:86)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExe
>>>>>>>>> cution.scala:86)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.DataSource.write(
>>>>>>>>> DataSource.scala:487)
>>>>>>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc
>>>>>>>>> ala:211)
>>>>>>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc
>>>>>>>>> ala:194)
>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>> at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>>>>>>> ssorImpl.java:62)
>>>>>>>>> at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>>>>>>> thodAccessorImpl.java:43)
>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac
>>>>>>>>> kendHandler.scala:141)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>>>>>>>> Handler.scala:86)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>>>>>>>> Handler.scala:38)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim
>>>>>>>>> pleChannelInboundHandler.java:105)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>>>>>>> Read(AbstractChannelHandlerContext.java:308)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>>>>>>> ad(AbstractChannelHandlerContext.java:294)
>>>>>>>>> at
>>>>>>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>>>>>>>>> essageToMessageDecoder.java:103)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>>>>>>> Read(AbstractChannelHandlerContext.java:308)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>>>>>>> ad(AbstractChannelHandlerContext.java:294)
>>>>>>>>> at
>>>>>>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>>>>>>>>> ToMessageDecoder.java:244)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>>>>>>>> Read(AbstractChannelHandlerContext.java:308)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>>>>>>>> ad(AbstractChannelHandlerContext.java:294)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>>>>>>>>> ultChannelPipeline.java:846)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re
>>>>>>>>> ad(AbstractNioByteChannel.java:131)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>>>>>>>>> tLoop.java:511)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>>>>>>>>> ed(NioEventLoop.java:468)
>>>>>>>>> at
>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve
>>>>>>>>> ntLoop.java:382)
>>>>>>>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>>>>>>>> at
>>>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>>>>>>>> gleThreadEventExecutor.java:111)
>>>>>>>>> at
>>>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl
>>>>>>>>> eDecorator.run(DefaultThreadFactory.java:137)
>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>>>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e
>>>>>>>>> -9110-510978eaaecb.csv.crc]:
>>>>>>>>> it still exists.
>>>>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>>>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>>>>>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-
>>>>>>>>> 9110-510978eaaecb.csv]:
>>>>>>>>> it still exists.
>>>>>>>>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job
>>>>>>>>> job_201609170803_0000
>>>>>>>>> aborted.
>>>>>>>>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
>>>>>>>>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
>>>>>>>>> org.apache.spark.SparkException: Job aborted.
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>>>>>>>> pFsRelationCommand.scala:149)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>>>>> tionCommand.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>>>>>>>> tionCommand.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>>>>>>>> nId(SQLExecution.scala:57)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>>>>>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>>>>> ideEffectResult$lzycompute(commands.scala:60)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>>>>>>>> ideEffectResult(commands.scala:58)
>>>>>>>>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context: http://apache-spark-user-list.
>>>>>>>>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-t
>>>>>>>>> p27761.html
>>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>>> Nabble.com.
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>>
>>>>>>> Sankar Mittapally
>>>>>>> Senior Software Engineer
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Sankar Mittapally
>>>>> Senior Software Engineer
>>>>>
>>>>
>>>>
>>

Re: write.df is failing on Spark Cluster

Reply via email to