S3 does not have the concept "directory". An S3 bucket only holds files (objects). The hadoop filesystem is mapped onto a bucket and use Hadoop-specific (or rather "s3tool"-specific: s3n uses the jets3t tool) conventions(hacks) to fake directories such as a ending with a slash ("filename/") and with s3n by "filename_$folder$" (these are leaky abstractions, google that if you ever have some spare time :p). S3 simply doesn't (and shouldn't) know about these conventions. Again, a bucket just holds a shitload of files. This might seem inconvenient but directories are really bad idea for scalable storage. However, setting "folder-like" permissions can be done through IAM: http://docs.aws.amazon.com/AmazonS3/latest/dev/example-policies-s3.html#iam-policy-ex1
Summarizing: by setting permissions on /dev you set permissions on that object. It has no effect on the file "/dev/output" which is, as far as S3 cares, another object that happens to share part of the objectname with /dev. Thomas Demoor skype: demoor.thomas mobile: +32 497883833 On Tue, Jan 27, 2015 at 6:33 AM, Chen, Kevin <kevin.c...@neustar.biz> wrote: > When spark saves rdd to a text file, the directory must not exist > upfront. It will create a directory and write the data to part-0000 under > that directory. In my use case, I create a directory dev in the bucket :// > nexgen-software/dev . I expect it creates output direct under dev and a > part-0000 under output. But it gave me exception as I only give write > permission to dev not the bucket. If I open up write permission to bucket, > it worked. But it did not create output directory under dev, it rather > creates another dev/output directory under bucket. I just want to know if > it is possible to have output directory created under dev directory I > created upfront. > > From: Nick Pentreath <nick.pentre...@gmail.com> > Date: Monday, January 26, 2015 9:15 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: SaveAsTextFile to S3 bucket > > Your output folder specifies > > rdd.saveAsTextFile("s3n://nexgen-software/dev/output"); > > So it will try to write to /dev/output which is as expected. If you > create the directory /dev/output upfront in your bucket, and try to save it > to that (empty) directory, what is the behaviour? > > On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin <kevin.c...@neustar.biz> > wrote: > >> Does anyone know if I can save a RDD as a text file to a pre-created >> directory in S3 bucket? >> >> I have a directory created in S3 bucket: //nexgen-software/dev >> >> When I tried to save a RDD as text file in this directory: >> rdd.saveAsTextFile("s3n://nexgen-software/dev/output"); >> >> >> I got following exception at runtime: >> >> Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: >> org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - >> ResponseCode=403, ResponseMessage=Forbidden >> >> >> I have verified /dev has write permission. However, if I grant the >> bucket //nexgen-software write permission, I don't get exception. But the >> output is not created under dev. Rather, a different /dev/output directory >> is created directory in the bucket (//nexgen-software). Is this how >> saveAsTextFile behalves in S3? Is there anyway I can have output created >> under a pre-defied directory. >> >> >> Thanks in advance. >> >> >> >> >> >