I want to save to local directory. I have tried the following and get error
r.saveAsTextFile("file:/home/cloudera/tmp/out1")
r.saveAsTextFile("file:///home/cloudera/tmp/out1")
r.saveAsTextFile("file:////home/cloudera/tmp/out1")
They all generate the following error
15/01/12 08:31:10 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5,
master01.cloudera): java.io.IOException: Mkdirs failed to create
file:/home/cloudera/temp/out1/_temporary/0/_temporary/attempt_201501120831_0001_m_000001_5
(exists=false, cwd=file:/var/run/spark/work/app-20150112080951-0002/0)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The id that run spark and driver program have full permission to the dir
/home/cloudera/tmp/. I can cd there and run "mkdir out1" to create the dir
without problem. I then remove the dir "out1" and then run
r.saveAsTextFile("file:/home/cloudera/tmp/out1")
I got the error above but the dir "out1" is created. Look like
r.saveAsTextFile(...) try to create sub dirs
out1/_temporary/0/_temporary/attempt_201501120831_0001_m_000001_5 which failed.
Has anybody successfully run r.saveAsTextFile(...) to save RDD to local file
system on Linux?
Ningjun
-----Original Message-----
From: Sean Owen [mailto:[email protected]]
Sent: Monday, January 12, 2015 11:25 AM
To: Wang, Ningjun (LNG-NPV)
Cc: [email protected]
Subject: Re: Failed to save RDD as text file to local file system
I think you're confusing HDFS paths and local paths. You are cd'ing to a
directory and seem to want to write output there, but your path has no scheme
and defaults to being an HDFS path. When you use "file:" you seem to have a
permission error (perhaps).
On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang <[email protected]>
wrote:
> Prannoy
>
>
>
> I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return
> without error. But where does it saved to? The folder
> “/home/cloudera/tmp/out1” is not cretaed.
>
>
>
> I also tried the following
>
> cd /home/cloudera/tmp/
>
> spark-shell
>
> scala> val r = sc.parallelize(Array("a", "b", "c"))
>
> scala> r.saveAsTextFile("out1")
>
>
>
> It does not return error. But still there is no “out1” folder created
> under /home/cloudera/tmp/
>
>
>
> I tried to give absolute path but then get an error
>
>
>
> scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
>
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eckFsPermission(DefaultAuthorizationProvider.java:257)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eck(DefaultAuthorizationProvider.java:238)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eck(DefaultAuthorizationProvider.java:216)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eckPermission(DefaultAuthorizationProvider.java:145)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermis
> sion(FSPermissionChecker.java:138)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS
> Namesystem.java:6286)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS
> Namesystem.java:6268)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAcces
> s(FSNamesystem.java:6220)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSN
> amesystem.java:4087)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesy
> stem.java:4057)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesyste
> m.java:4030)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNo
> deRpcServer.java:787)
>
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClien
> tProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran
> slatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli
> entNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.
> java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call
> (ProtobufRpcEngine.java:587)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>
> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>
> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> ion.java:1642)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>
>
> Very frustrated. Please advise.
>
>
>
>
>
> Regards,
>
>
>
> Ningjun Wang
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> From: Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden
> email]]
> Sent: Monday, January 12, 2015 4:18 AM
>
>
> To: Wang, Ningjun (LNG-NPV)
> Subject: Re: Failed to save RDD as text file to local file system
>
>
>
> Have you tried simple giving the path where you want to save the file ?
>
>
>
> For instance in your case just do
>
>
>
> r.saveAsTextFile("home/cloudera/tmp/out1")
>
>
>
> Dont use file
>
>
>
> This will create a folder with name out1. saveAsTextFile always write
> by making a directory, it does not write data into a single file.
>
>
>
> Incase you need a single file you can use copyMerge API in FileUtils.
>
>
>
> FileUtil.copyMerge(fs, home/cloudera/tmp/out1,
> fs,home/cloudera/tmp/out2 , true, conf,null);
>
> Now out2 will be a single file containing your data.
>
> fs is the configuration of you local file system.
>
> Thanks
>
>
>
>
>
> On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User
> List] <[hidden email]> wrote:
>
> No, do you have any idea?
>
>
>
> Regards,
>
>
>
> Ningjun Wang
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> From: firemonk9 [via Apache Spark User List] [mailto:[hidden
> email][hidden email]]
> Sent: Friday, January 09, 2015 2:56 PM
> To: Wang, Ningjun (LNG-NPV)
> Subject: Re: Failed to save RDD as text file to local file system
>
>
>
> Have you found any resolution for this issue ?
>
> ________________________________
>
> If you reply to this email, your message will be added to the
> discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD
> -as-text-file-to-local-file-system-tp21050p21067.html
>
> To unsubscribe from Failed to save RDD as text file to local file
> system, click here.
> NAML
>
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the
> discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD
> -as-text-file-to-local-file-system-tp21050p21068.html
>
> To start a new topic under Apache Spark User List, email [hidden
> email] To unsubscribe from Apache Spark User List, click here.
> NAML
>
>
>
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the
> discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD
> -as-text-file-to-local-file-system-tp21050p21093.html
>
> To unsubscribe from Failed to save RDD as text file to local file
> system, click here.
> NAML
>
>
> ________________________________
> View this message in context: RE: Failed to save RDD as text file to
> local file system
>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]