Well, even before spark-submit the standard way of setting spark configurations is to create a new SparkConf, set the values in the conf, and pass this to the SparkContext in your application. It's true that this involves "hard-coding" these configurations in your application, but these configurations intended to be application-level settings anyway, rather than cluster-wide settings. Environment variables are not really ideal for this purpose, though it's an easy way to change these settings quickly.
2014-06-20 14:03 GMT-07:00 Koert Kuipers <ko...@tresata.com>: > thanks for the detailed answer andrew. thats helpful. > > i think the main thing thats bugging me is that there is no simple way for > an admin to always set something on the executors for a production > environment (an akka timeout comes to mind). yes i could use > spark-defaults for that, although that means everything must be submitted > through spark-submit, which is fairly new and i am not sure how much we > will use that yet. i will look into that some more. > > > On Thu, Jun 19, 2014 at 6:56 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> for a jvm application its not very appealing to me to use spark >> submit.... my application uses hadoop, so i should use "hadoop jar", and my >> application uses spark, so it should use "spark-submit". if i add a piece >> of code that uses some other system there will be yet another suggested way >> to launch it. thats not very scalable, since i can only launch it one way >> in the end... >> >> >> On Thu, Jun 19, 2014 at 4:58 PM, Andrew Or <and...@databricks.com> wrote: >> >>> Hi Koert and Lukasz, >>> >>> The recommended way of not hard-coding configurations in your >>> application is through conf/spark-defaults.conf as documented here: >>> http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties. >>> However, this is only applicable to >>> spark-submit, so this may not be useful to you. >>> >>> Depending on how you launch your Spark applications, you can workaround >>> this by manually specifying these configs as -Dspark.x=y >>> in your java command to launch Spark. This is actually how >>> SPARK_JAVA_OPTS used to work before 1.0. Note that spark-submit does >>> essentially the same thing, but sets these properties programmatically >>> by reading from the conf/spark-defaults.conf file and calling >>> System.setProperty("spark.x", "y"). >>> >>> Note that spark.executor.extraJavaOpts not intended for spark >>> configuration (see >>> http://spark.apache.org/docs/latest/configuration.html). >>> SPARK_DAEMON_JAVA_OPTS, as you pointed out, is for Spark daemons like >>> the standalone master, worker, and the history server; >>> it is also not intended for spark configurations to be picked up by >>> Spark executors and drivers. In general, any reference to "java opts" >>> in any variable or config refers to java options, as the name implies, >>> not Spark configuration. Unfortunately, it just so happened that we >>> used to mix the two in the same environment variable before 1.0. >>> >>> Is there a reason you're not using spark-submit? Is it for legacy >>> reasons? As of 1.0, most changes to launching Spark applications >>> will be done through spark-submit, so you may miss out on relevant new >>> features or bug fixes. >>> >>> Andrew >>> >>> >>> >>> 2014-06-19 7:41 GMT-07:00 Koert Kuipers <ko...@tresata.com>: >>> >>> still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark >>>> standalone. >>>> >>>> for example if i have a akka timeout setting that i would like to be >>>> applied to every piece of the spark framework (so spark master, spark >>>> workers, spark executor sub-processes, spark-shell, etc.). i used to do >>>> that with SPARK_JAVA_OPTS. now i am unsure. >>>> >>>> SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for >>>> the spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does >>>> not seem that useful. for example for a worker it does not apply the >>>> settings to the executor sub-processes, while for SPARK_JAVA_OPTS it does >>>> do that. so seems like SPARK_JAVA_OPTS is my only way to change settings >>>> for the executors, yet its deprecated? >>>> >>>> >>>> On Wed, Jun 11, 2014 at 10:59 PM, elyast <lukasz.jastrzeb...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as >>>>> conf/java-opts >>>>> file to set additional java system properties. In this case I could >>>>> connect >>>>> to tachyon without any problem. >>>>> >>>>> However when I tried setting executor and driver extraJavaOptions in >>>>> spark-defaults.conf it doesn't. >>>>> >>>>> I suspect the root cause may be following: >>>>> >>>>> SparkSubmit doesn't fork additional JVM to actually run either driver >>>>> or >>>>> executor process and additional system properties are set after JVM is >>>>> created and other classes are loaded. It may happen that Tachyon >>>>> CommonConf >>>>> class is already being loaded and since its Singleton it won't pick up >>>>> and >>>>> changes to system properties. >>>>> >>>>> Please let me know what do u think. >>>>> >>>>> Can I use conf/java-opts ? since it's not really documented anywhere? >>>>> >>>>> Best regards >>>>> Lukasz >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>> >>>> >>> >> >