On Tue, May 31, 2016 at 7:05 AM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> And on another note, is it required to use s3a? Why not use s3:// only? I
> prefer to use s3a:// only while writing files to S3 from EMR
>

​Does Spark support s3://? I am using s3a over s3n because I needed to
access files larger than 5GB​.

​I am on a local cluster with (most likely) no firewall restrictions, but
considering spawning an EMR cluster now. The version of Spark is 1.6.



>
> Regards,
> Gourav Sengupta
>
> On Tue, May 31, 2016 at 12:04 PM, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> Is your spark cluster running in EMR or via self created SPARK cluster
>> using EC2 or from a local cluster behind firewall?  What is the SPARK
>> version you are using?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Sun, May 29, 2016 at 10:55 PM, Mayuresh Kunjir <mayur...@cs.duke.edu>
>> wrote:
>>
>>> I'm running into permission issues while accessing data in S3 bucket
>>> stored using s3a file system from a local Spark cluster. Has anyone found
>>> success with this?
>>>
>>> My setup is:
>>> - Spark 1.6.1 compiled against Hadoop 2.7.2
>>> - aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.2.jar in the classpath
>>> - Spark's Hadoop configuration is as follows:
>>>
>>>
>>> sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
>>>
>>> sc.hadoopConfiguration.set("fs.s3a.access.key", <access>)
>>>
>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", <secret>)
>>>
>>> (The secret key does not have any '/' characters which is reported to
>>> cause some issue by others)
>>>
>>>
>>> I have configured my S3 bucket to grant the necessary permissions. (
>>> https://sparkour.urizone.net/recipes/configuring-s3/)
>>>
>>>
>>> What works: Listing, reading from, and writing to s3a using hadoop
>>> command. e.g. hadoop dfs -ls s3a://<bucket name>/<file path>
>>>
>>>
>>> What doesn't work: Reading from s3a using Spark's textFile API. Each
>>> task throws an exception which says *Forbidden Access(403)*.
>>>
>>>
>>> Some online documents suggest to use IAM roles to grant permissions for
>>> an AWS cluster. But I would like a solution for my local standalone cluster.
>>>
>>>
>>> Any help would be appreciated.
>>>
>>>
>>> Regards,
>>>
>>> ~Mayuresh
>>>
>>
>>
>

Reply via email to