Well, even before spark-submit the standard way of setting spark
configurations is to create a new SparkConf, set the values in the conf,
and pass this to the SparkContext in your application. It's true that this
involves "hard-coding" these configurations in your application, but these
configurations intended to be application-level settings anyway, rather
than cluster-wide settings. Environment variables are not really ideal for
this purpose, though it's an easy way to change these settings quickly.


2014-06-20 14:03 GMT-07:00 Koert Kuipers <ko...@tresata.com>:

> thanks for the detailed answer andrew. thats helpful.
>
> i think the main thing thats bugging me is that there is no simple way for
> an admin to always set something on the executors for a production
> environment (an akka timeout comes to mind). yes i could use
> spark-defaults  for that, although that means everything must be submitted
> through spark-submit, which is fairly new and i am not sure how much we
> will use that yet. i will look into that some more.
>
>
> On Thu, Jun 19, 2014 at 6:56 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> for a jvm application its not very appealing to me to use spark
>> submit.... my application uses hadoop, so i should use "hadoop jar", and my
>> application uses spark, so it should use "spark-submit". if i add a piece
>> of code that uses some other system there will be yet another suggested way
>> to launch it. thats not very scalable, since i can only launch it one way
>> in the end...
>>
>>
>> On Thu, Jun 19, 2014 at 4:58 PM, Andrew Or <and...@databricks.com> wrote:
>>
>>> Hi Koert and Lukasz,
>>>
>>> The recommended way of not hard-coding configurations in your
>>> application is through conf/spark-defaults.conf as documented here:
>>> http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties.
>>> However, this is only applicable to
>>> spark-submit, so this may not be useful to you.
>>>
>>> Depending on how you launch your Spark applications, you can workaround
>>> this by manually specifying these configs as -Dspark.x=y
>>> in your java command to launch Spark. This is actually how
>>> SPARK_JAVA_OPTS used to work before 1.0. Note that spark-submit does
>>> essentially the same thing, but sets these properties programmatically
>>> by reading from the conf/spark-defaults.conf file and calling
>>> System.setProperty("spark.x", "y").
>>>
>>> Note that spark.executor.extraJavaOpts not intended for spark
>>> configuration (see
>>> http://spark.apache.org/docs/latest/configuration.html).
>>>  SPARK_DAEMON_JAVA_OPTS, as you pointed out, is for Spark daemons like
>>> the standalone master, worker, and the history server;
>>> it is also not intended for spark configurations to be picked up by
>>> Spark executors and drivers. In general, any reference to "java opts"
>>> in any variable or config refers to java options, as the name implies,
>>> not Spark configuration. Unfortunately, it just so happened that we
>>> used to mix the two in the same environment variable before 1.0.
>>>
>>> Is there a reason you're not using spark-submit? Is it for legacy
>>> reasons? As of 1.0, most changes to launching Spark applications
>>> will be done through spark-submit, so you may miss out on relevant new
>>> features or bug fixes.
>>>
>>> Andrew
>>>
>>>
>>>
>>> 2014-06-19 7:41 GMT-07:00 Koert Kuipers <ko...@tresata.com>:
>>>
>>> still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark
>>>> standalone.
>>>>
>>>> for example if i have a akka timeout setting that i would like to be
>>>> applied to every piece of the spark framework (so spark master, spark
>>>> workers, spark executor sub-processes, spark-shell, etc.). i used to do
>>>> that with SPARK_JAVA_OPTS. now i am unsure.
>>>>
>>>> SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for
>>>> the spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does
>>>> not seem that useful. for example for a worker it does not apply the
>>>> settings to the executor sub-processes, while for SPARK_JAVA_OPTS it does
>>>> do that. so seems like SPARK_JAVA_OPTS is my only way to change settings
>>>> for the executors, yet its deprecated?
>>>>
>>>>
>>>> On Wed, Jun 11, 2014 at 10:59 PM, elyast <lukasz.jastrzeb...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as
>>>>> conf/java-opts
>>>>> file to set additional java system properties. In this case I could
>>>>> connect
>>>>> to tachyon without any problem.
>>>>>
>>>>> However when I tried setting executor and driver extraJavaOptions in
>>>>> spark-defaults.conf it doesn't.
>>>>>
>>>>> I suspect the root cause may be following:
>>>>>
>>>>> SparkSubmit doesn't fork additional JVM to actually run either driver
>>>>> or
>>>>> executor process and additional system properties are set after JVM is
>>>>> created and other classes are loaded. It may happen that Tachyon
>>>>> CommonConf
>>>>> class is already being loaded and since its Singleton it won't pick up
>>>>> and
>>>>> changes to system properties.
>>>>>
>>>>> Please let me know what do u think.
>>>>>
>>>>> Can I use conf/java-opts ? since it's not really documented anywhere?
>>>>>
>>>>> Best regards
>>>>> Lukasz
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to