Re: Spark and S3 server side encryption

Charles Feduke Wed, 28 Jan 2015 08:25:39 -0800

I have been trying to work around a similar problem with my Typesafe config
*.conf files seemingly not appearing on the executors. (Though now that I
think about it its not because the files are absent in the JAR, but because
the -Dconf.resource environment variable I pass to the master obviously
doesn't get relayed to the workers.)


What happens if you do something like this:

nohup ./bin/spark-submit --verbose —jars lib/app.jar \
--master spark://master-amazonaws.com:7077  \
--class com.elsevier.spark.SparkSync \
--conf
"spark.executor.extraJavaOptions=-Ds3service.server-side-encryption=AES256"
lib/app.jar > out.log &

(I bet this will fix my problem too.)


On Wed Jan 28 2015 at 10:17:09 AM Kohler, Curt E (ELS-STL) <
c.koh...@elsevier.com> wrote:

>  So, following up on your suggestion, I'm still having some problems
> getting the configuration changes recognized when my job run.
>
>
>  I’ve added jets3t.properties to the root of my application jar file that
> I
> submit to Spark (via spark-submit).
>
> I’ve verified that my jets3t.properties is at the root of my application
> jar by executing jar tf app.jar.
>
> I submit my job to the cluster with the following command.
>
> nohup ./bin/spark-submit --verbose —jars lib/app.jar --master
> spark://master-amazonaws.com:7077  --class com.elsevier.spark.SparkSync
> lib/app.jar > out.log &
>
>
>
> In my mainline of app.jar, I also added the following code:
>
>
> log.info(System.getProperty("java.class.path"));
> InputStream in =
> SparkSync.class.getClassLoader().getResourceAsStream("jets3t.properties");
> log.info(getStringFromInputStream(in));
>
> And I can see that the jets3t.properties I provided is found because it
> outputs:
>
> s3service.server-side-encryption=AES256
>
> It’s almost as if the hadoop/jets3t piece has already been initialized and
> is ignoring my jets3t.properties.
>
> I can get this all working inside of Eclipse by including the folder
> containing my jets3t.properties.  But, I can’t get things working when
> trying to submit this to a spark stand-alone cluster.
>
> Any insights would be appreciated.
>  ------------------------------
> *From:* Thomas Demoor <thomas.dem...@amplidata.com>
> *Sent:* Tuesday, January 27, 2015 4:41 AM
> *To:* Kohler, Curt E (ELS-STL)
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark and S3 server side encryption
>
>  Spark uses the Hadoop filesystems.
>
>  I assume you are trying to use s3n:// which, under the hood, uses the
> 3rd party jets3t library. It is configured through the jets3t.properties
> file (google "hadoop s3n jets3t") which you should put on Spark's
> classpath. The setting you are looking for
> is s3service.server-side-encryption
>
>  The last version of hadoop (2.6) introduces a new and improved s3a://
> filesystem which has the official sdk from Amazon under the hood.
>
>
> On Mon, Jan 26, 2015 at 10:01 PM, curtkohler <c.koh...@elsevier.com>
> wrote:
>
>> We are trying to create a Spark job that writes out a file to S3 that
>> leverage S3's server side encryption for sensitive data. Typically this is
>> accomplished by setting the appropriate header on the put request, but it
>> isn't clear whether this capability is exposed in the Spark/Hadoop APIs.
>> Does anyone have any suggestions?
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Spark and S3 server side encryption

Reply via email to