Re: What is a best practice for passing environment variables to Spark workers?

Dmitry Goldenberg Fri, 10 Jul 2015 05:36:07 -0700

Thanks, Akhil.

We're trying the conf.setExecutorEnv() approach since we've already got
environment variables set. For system properties we'd go the
conf.set("spark.xxxx") route.


We were concerned that doing the below type of thing did not work, which
this blog post seems to confirm (
http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.html):

$SPARK_HOME/spark-submit \
  --class "com.acme.Driver" \
  --conf spark.executorEnv.VAR1=VAL1 \
--conf spark.executorEnv.VAR2=VAL2 \
.....................

The code running on the workers does not see these variables.


On Fri, Jul 10, 2015 at 4:03 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> It basically filters out everything which doesn't starts with "spark
> <https://github.com/apache/spark/blob/658814c898bec04c31a8e57f8da0103497aac6ec/core/src/main/scala/org/apache/spark/SparkConf.scala#L314>."
> so it is necessary to keep spark. in the property name.
>
> Thanks
> Best Regards
>
> On Fri, Jul 10, 2015 at 12:06 AM, dgoldenberg <dgoldenberg...@gmail.com>
> wrote:
>
>> I have about 20 environment variables to pass to my Spark workers. Even
>> though they're in the init scripts on the Linux box, the workers don't see
>> these variables.
>>
>> Does Spark do something to shield itself from what may be defined in the
>> environment?
>>
>> I see multiple pieces of info on how to pass the env vars into workers and
>> they seem dated and/or unclear.
>>
>> Here:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-pass-config-variables-to-workers-tt5780.html
>>
>> SparkConf conf = new SparkConf();
>> conf.set("spark.myapp.myproperty", "propertyValue");
>> OR
>> set them in spark-defaults.conf, as in
>> spark.config.one value
>> spark.config.two value2
>>
>> In another posting,
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-variable-for-a-spark-job-tt3180.html
>> :
>> conf.setExecutorEnv("ORACLE_HOME", myOraHome)
>> conf.setExecutorEnv("SPARK_JAVA_OPTS",
>> "-Djava.library.path=/my/custom/path")
>>
>> The configuration guide talks about
>> "spark.executorEnv.[EnvironmentVariableName] -- Add the environment
>> variable
>> specified by EnvironmentVariableName to the Executor process. The user can
>> specify multiple of these to set multiple environment variables."
>>
>> Then there are mentions of SPARK_JAVA_OPTS which seems to be deprecated
>> (?)
>>
>> What is the easiest/cleanest approach here?  Ideally, I'd not want to
>> burden
>> my driver program with explicit knowledge of all the env vars that are
>> needed on the worker side.  I'd also like to avoid having to jam them into
>> spark-defaults.conf since they're already set in the system init scripts,
>> so
>> why duplicate.
>>
>> I suppose one approach would be to namespace all my vars to start with a
>> well-known prefix, then cycle through the env in the driver and stuff all
>> these variables into the Spark context.  If I'm doing that, would I want
>> to
>>
>> conf.set("spark.myapp.myproperty", "propertyValue");
>>
>> and is "spark." necessary? or was that just part of the example?
>>
>> or would I want to
>>
>> conf.setExecutorEnv("MYPREFIX_MY_VAR_1", "some-value");
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-best-practice-for-passing-environment-variables-to-Spark-workers-tp23751.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: What is a best practice for passing environment variables to Spark workers?

Reply via email to