Re: s3a file system and spark deployment mode

Steve Loughran Fri, 16 Oct 2015 02:26:20 -0700

> On 15 Oct 2015, at 19:04, Scott Reynolds <sreyno...@twilio.com> wrote:
> 
> List,
> 
> Right now we build our spark jobs with the s3a hadoop client. We do this 
> because our machines are only allowed to use IAM access to the s3 store. We 
> can build our jars with the s3a filesystem and the aws sdk just fine and this 
> jars run great in *client mode*. 
> 
> We would like to move from client mode to cluster mode as that will allow us 
> to be more resilient to driver failure. In order to do this either:
> 1. the jar file has to be on worker's local disk
> 2. the jar file is in shared storage (s3a)
> 
> We would like to put the jar file in s3 storage, but when we give the jar 
> path as s3a://......, the worker node doesn't have the hadoop s3a and aws sdk 
> in its classpath / uber jar.
> 
> Other then building spark with those two dependencies, what other options do 
> I have ? We are using 1.5.1 so SPARK_CLASSPATH is no longer a thing.
> 
> Need to get s3a access to both the master (so that we can log spark event log 
> to s3) and to the worker processes (driver, executor).
> 
> Looking for ideas before just adding the dependencies to our spark build and 
> calling it a day.



you can use --jars to add these, e.g

-jars hadoop-aws.jar,aws-java-sdk-s3


as others have warned, you need Hadoop 2.7.1 for s3a to work proplery

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: s3a file system and spark deployment mode

Reply via email to