Hi Tamas,

Yes, Marcelo is right. The reason why it doesn't make sense to set
"spark.driver.memory" in your SparkConf is because your application code,
by definition, *is* the driver. This means by the time you get to the code
that initializes your SparkConf, your driver JVM has already started with
some heap size, and you can't easily change the size of the JVM once it has
started. Note that this is true regardless of the deploy mode (client or
cluster).

Alternatives to set this include the following: (1) You can set
"spark.driver.memory" in your `spark-defaults.conf` on the node that
submits the application, (2) You can use the --driver-memory command line
option if you are using Spark submit (bin/pyspark goes through this path,
as you have discovered on your own).

Does that make sense?


2014-10-01 10:17 GMT-07:00 Tamas Jambor <jambo...@gmail.com>:

> when you say "respective backend code to launch it", I thought this is
> the way to do that.
>
> thanks,
> Tamas
>
> On Wed, Oct 1, 2014 at 6:13 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> > Because that's not how you launch apps in cluster mode; you have to do
> > it through the command line, or by calling directly the respective
> > backend code to launch it.
> >
> > (That being said, it would be nice to have a programmatic way of
> > launching apps that handled all this - this has been brought up in a
> > few different contexts, but I don't think there's an "official"
> > solution yet.)
> >
> > On Wed, Oct 1, 2014 at 9:59 AM, Tamas Jambor <jambo...@gmail.com> wrote:
> >> thanks Marcelo.
> >>
> >> What's the reason it is not possible in cluster mode, either?
> >>
> >> On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>> You can't set up the driver memory programatically in client mode. In
> >>> that mode, the same JVM is running the driver, so you can't modify
> >>> command line options anymore when initializing the SparkContext.
> >>>
> >>> (And you can't really start cluster mode apps that way, so the only
> >>> way to set this is through the command line / config files.)
> >>>
> >>> On Wed, Oct 1, 2014 at 9:26 AM, jamborta <jambo...@gmail.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> I cannot figure out why this command is not setting the driver memory
> (it is
> >>>> setting the executor memory):
> >>>>
> >>>>     conf = (SparkConf()
> >>>>                 .setMaster("yarn-client")
> >>>>                 .setAppName("test")
> >>>>                 .set("spark.driver.memory", "1G")
> >>>>                 .set("spark.executor.memory", "1G")
> >>>>                 .set("spark.executor.instances", 2)
> >>>>                 .set("spark.executor.cores", 4))
> >>>>     sc = SparkContext(conf=conf)
> >>>>
> >>>> whereas if I run the spark console:
> >>>> ./bin/pyspark --driver-memory 1G
> >>>>
> >>>> it sets it correctly. Seemingly they both generate the same commands
> in the
> >>>> logs.
> >>>>
> >>>> thanks a lot,
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-driver-memory-is-not-set-pyspark-1-1-0-tp15498.html
> >>>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>>> For additional commands, e-mail: user-h...@spark.apache.org
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Marcelo
> >
> >
> >
> > --
> > Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to