Are you using EMR? You can install Hadoop-2.6.0 along with Spark-1.5.1 in your EMR cluster. And that brings s3a jars to the worker nodes and it becomes available to your application.
On Thu, Oct 15, 2015 at 11:04 AM, Scott Reynolds <sreyno...@twilio.com> wrote: > List, > > Right now we build our spark jobs with the s3a hadoop client. We do this > because our machines are only allowed to use IAM access to the s3 store. We > can build our jars with the s3a filesystem and the aws sdk just fine and > this jars run great in *client mode*. > > We would like to move from client mode to cluster mode as that will allow > us to be more resilient to driver failure. In order to do this either: > 1. the jar file has to be on worker's local disk > 2. the jar file is in shared storage (s3a) > > We would like to put the jar file in s3 storage, but when we give the jar > path as s3a://......, the worker node doesn't have the hadoop s3a and aws > sdk in its classpath / uber jar. > > Other then building spark with those two dependencies, what other options > do I have ? We are using 1.5.1 so SPARK_CLASSPATH is no longer a thing. > > Need to get s3a access to both the master (so that we can log spark event > log to s3) and to the worker processes (driver, executor). > > Looking for ideas before just adding the dependencies to our spark build > and calling it a day. >