Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

Timothy Sum Hon Mun Mon, 31 Aug 2015 19:24:07 -0700

Dear Sandy,

Many thanks for your reply.


I am going to respond to your replies in reverse order if you don't mind as
my second question is the more pressing issue for now.

In the situation where you give more memory, but less memory overhead, and
> the job completes less quickly, have you checked to see whether YARN is
> killing any containers?  It could be that the job completes more slowly
> because, without the memory overhead, YARN kills containers while it's
> running.  So it needs to run some tasks multiple times.


I sincerely apologize if the way I structured my post was confusing, but my
second question was with regards to the MEMORY_TOTAL created by YARN in the
JVM and why using different settings although the MEMORY_TOTAL is different
(assuming the way I calculated them is correct) would lead to the same job
being ran successfully. To answer your reply above, they both ran about the
same time and YARN did kill containers in both cases but that was not my
question. In my 4 cases, the first and second cases are job failures where
the failure in the second case ran longer before failing (leading to the
first question) and the third and fourth cases are the jobs being able to
complete(leading to my second question).

*Second Question*
I was not concerned that the job completes less quickly but I did not
understand why using memoryOverhead configuration allows for a lower
MEMORY_TOTAL for a successful job run. I apologize beforehand if I
misunderstood your message.

*/bin/spark-submit --class <class name> --master
yarn-cluster --driver-memory 11g --executor-memory 1g --num-executors 3
--executor-cores 1 --jars <jar file>*

If I do not mess with the default memory overhead settings as above, I have
to use driver memory greater than 10g for my job to run successfully.





*spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory
thatYARN will create a JVM= 11g + (driverMemory * 0.07, with minimum of
384m)= 11g + 1.154g= 12.154g*

>From the formula above, it means that I require MEMORY_TOTAL of 12.154g for
my job. However, I need less MEMORY_TOTAL when I fiddle with the
memoryOverhead configuration as below:


*/bin/spark-submit --class <class name> --master
yarn-cluster --driver-memory 2g --executor-memory 1g
--conf spark.yarn.executor.memoryOverhead=1024
--conf spark.yarn.driver.memoryOverhead=1024 --num-executors 3
--executor-cores --jars <jar file>*




*spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory
thatYARN will create a JVM= 2 + 1024m (command line configuration)*
*= 3g*

I updated the second formula in the thread before I emailed you but I
noticed in your reply that it had the wrong version. In this case, the
memory overhead is quite close to each other but just by using the
configuration for *spark.yarn.executor.memoryOverhead=1024 --conf
spark.yarn.driver.memoryOverhead=1024*,  my job can be completed with a
MEMORY_TOTAL of 3g instead of 12.154g in the above case which was my source
of confusion. Am I making a mistake somewhere? Or is using the
memoryOverhead configuration setting doing something behind the scenes? I
just tested this again and I always require higher driver memory whenever I
am doing it as in the above case.

*First Question*

For your first question, you would need to look in the logs and provide
> additional information about why your job is failing.  The SparkContext
> shutting down could happen for a variety of reasons.


My first question was with regards to debugging failed Spark jobs as I
always seem to receive a different error when I run the Spark job with
slightly different settings. I was able to tune the Spark job so that it
runs successfully by increase memory and memory overhead but I hope you can
enlighten me on the nuances and difference in the error log files because
they seem to always be the issue of lack of memory to me. I am sorry that
there was no log file in original message. I have re-run the jobs with the
same settings that led to the failed runs and I will reiterate my first
question with the error logs and their respective diagnostics. I ran them
around 10 times each and the error logs below are the most consistent.

If I run my job with */bin/spark-submit --class <class name> --master
yarn-cluster --driver-memory 7g --executor-memory 1g --num-executors 3
--executor-cores 1 --jars <jar file>, *it will give either Error Log 1 and
Error Log 2.

If I run my job with  */bin/spark-submit --class <class name> --master
yarn-cluster --driver-memory 7g --executor-memory 3g --num-executors 3
--executor-cores 1 --jars <jar file>, *it will give Error Log 3. Error Log
2 and 3 looks similar with the only difference being the diagnostics.

Is there a subtle difference between the error being thrown (I know that
increasing memory and memory overhead solves the issue)? Why does increase
executor memory give a different type of error?

Thanks again for taking the time to answer my question and I truly
appreciate it. I hope you won't mind the long email as I try to provide as
much info as I can!

I am interested to understand Spark properly which is why I am asking the
questions especially the second question above.

*Error Log 1*

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1084)

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1083)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1083)

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)

org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

org.apache.spark.SparkException: Job aborted due to stage failure:
Task serialization failed: java.lang.IllegalStateException: Cannot
call methods on a stopped SparkContext

org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)

org.apache.spark.SparkContext.broadcast(SparkContext.scala:1282)

org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874)

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1088)

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply(DAGScheduler.scala:1084)

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply(DAGScheduler.scala:1084)

scala.Option.foreach(Option.scala:236)

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1084)

org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1083)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1083)

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)

org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)

        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)

        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:884)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1088)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply(DAGScheduler.scala:1084)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply(DAGScheduler.scala:1084)

        at scala.Option.foreach(Option.scala:236)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1084)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1083)

        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1083)

        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)

        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)

        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



Diagnostics:
User class threw exception: org.apache.spark.SparkException: Job
aborted due to stage failure: Task serialization failed:
java.lang.IllegalStateException: Cannot call methods on a stopped
SparkContext
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
org.apache.spark.SparkContext.broadcast(SparkContext.scala:1282)
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1088)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply(DAGScheduler.scala:1084)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14$$anonfun$apply$1.apply(DAGScheduler.scala:1084)
scala.Option.foreach(Option.scala:236)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1084)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$14.apply(DAGScheduler.scala:1083)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1083)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

*Error Log 2*

15/09/01 01:39:59 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM

15/09/01 01:40:00 ERROR yarn.ApplicationMaster: User class threw
exception: org.apache.spark.SparkException: Job cancelled because
SparkContext was shut down

org.apache.spark.SparkException: Job cancelled because SparkContext
was shut down

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:736)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:735)

        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

        at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:735)

        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1468)

        at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)

        at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1403)

        at org.apache.spark.SparkContext.stop(SparkContext.scala:1642)

        at 
org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:559)

        at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)

        at scala.util.Try$.apply(Try.scala:161)

        at 
org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)

        at 
org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)

        at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



Diagnostics:
Application application_1440667888904_0079 failed 2 times due to AM
Container for appattempt_1440667888904_0079_000002 exited with
exitCode: -103
For more detailed output, check application tracking
page:http://cemas-1:8088/cluster/app/application_1440667888904_0079Then,
click on links to logs of each attempt.
Diagnostics: Container
[pid=14317,containerID=container_1440667888904_0079_02_000001] is
running beyond virtual memory limits. Current usage: 344.4 MB of 8 GB
physical memory used; 8.0 GB of 8 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1440667888904_0079_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
|- 14322 14317 14317 14317 (java) 1398 65 8607490048 87877
/usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Xmx7168m
-Djava.io.tmpdir=/opt/hadoop/var/nm-local-dir/usercache/ts444/appcache/application_1440667888904_0079/container_1440667888904_0079_02_000001/tmp
-Dspark.driver.memory=7g -Dspark.executor.memory=1g
-Dspark.master=yarn-cluster
-Dspark.app.name=CO880.testing.algorithm_v1.SeCo
-Dspark.yarn.app.container.log.dir=/opt/hadoop/var/userlogs/application_1440667888904_0079/container_1440667888904_0079_02_000001
org.apache.spark.deploy.yarn.ApplicationMaster --class
CO880.testing.algorithm_v1.SeCo --jar file:/home/cuc/ts444/SeCo1.jar
--arg mushroom.arff --arg mushroomtest.arff --executor-memory 1024m
--executor-cores 1 --num-executors 10
|- 14317 14315 14317 14317 (bash) 0 0 12750848 302 /bin/bash -c
/usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Xmx7168m
-Djava.io.tmpdir=/opt/hadoop/var/nm-local-dir/usercache/ts444/appcache/application_1440667888904_0079/container_1440667888904_0079_02_000001/tmp
'-Dspark.driver.memory=7g' '-Dspark.executor.memory=1g'
'-Dspark.master=yarn-cluster'
'-Dspark.app.name=CO880.testing.algorithm_v1.SeCo'
-Dspark.yarn.app.container.log.dir=/opt/hadoop/var/userlogs/application_1440667888904_0079/container_1440667888904_0079_02_000001
org.apache.spark.deploy.yarn.ApplicationMaster --class
'CO880.testing.algorithm_v1.SeCo' --jar file:/home/cuc/ts444/SeCo1.jar
--arg 'mushroom.arff' --arg 'mushroomtest.arff' --executor-memory
1024m --executor-cores 1 --num-executors 10 1>
/opt/hadoop/var/userlogs/application_1440667888904_0079/container_1440667888904_0079_02_000001/stdout
2> 
/opt/hadoop/var/userlogs/application_1440667888904_0079/container_1440667888904_0079_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.

*Error Log 3*

15/09/01 01:42:19 INFO cluster.YarnClusterSchedulerBackend:
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.8
15/09/01 01:42:19 INFO cluster.YarnClusterScheduler:
YarnClusterScheduler.postStartHook done
15/09/01 01:42:25 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
15/09/01 01:42:25 ERROR yarn.ApplicationMaster: User class threw
exception: org.apache.spark.SparkException: Job cancelled because
SparkContext was shut down
org.apache.spark.SparkException: Job cancelled because SparkContext
was shut down
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:736)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:735)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:735)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1468)
        at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
        at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1403)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1642)
        at 
org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:559)
        at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262)
        at scala.util.Try$.apply(Try.scala:161)
        at 
org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244)
        at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)


Diagnostics:User class threw exception: org.apache.spark.SparkException:
Job cancelled because SparkContext was shut down



On Mon, Aug 31, 2015 at 10:03 PM, Sandy Ryza <sandy.r...@cloudera.com>
wrote:

> Hi Timothy,
>
> For your first question, you would need to look in the logs and provide
> additional information about why your job is failing.  The SparkContext
> shutting down could happen for a variety of reasons.
>
> In the situation where you give more memory, but less memory overhead, and
> the job completes less quickly, have you checked to see whether YARN is
> killing any containers?  It could be that the job completes more slowly
> because, without the memory overhead, YARN kills containers while it's
> running.  So it needs to run some tasks multiple times.
>
> -Sandy
>
> On Sat, Aug 29, 2015 at 6:57 PM, timothy22000 <timothy22...@gmail.com>
> wrote:
>
>> I am doing some memory tuning on my Spark job on YARN and I notice
>> different
>> settings would give different results and affect the outcome of the Spark
>> job run. However, I am confused and do not understand completely why it
>> happens and would appreciate if someone can provide me with some guidance
>> and explanation.
>>
>> I will provide some background information and describe the cases that I
>> have experienced and post my questions after them below.
>>
>> *My environment setting were as below:*
>>
>>  - Memory 20G, 20 VCores per node (3 nodes in total)
>>  - Hadoop 2.6.0
>>  - Spark 1.4.0
>>
>> My code recursively filters an RDD to make it smaller (removing examples
>> as
>> part of an algorithm), then does mapToPair and collect to gather the
>> results
>> and save them within a list.
>>
>>  First Case
>>
>> /`/bin/spark-submit --class <class name> --master yarn-cluster
>> --driver-memory 7g --executor-memory 1g --num-executors 3
>> --executor-cores 1
>> --jars <jar file>`
>> /
>> If I run my program with any driver memory less than 11g, I will get the
>> error below which is the SparkContext being stopped or a similar error
>> which
>> is a method being called on a stopped SparkContext. From what I have
>> gathered, this is related to memory not being enough.
>>
>>
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24507/EKxQD.png
>> >
>>
>> Second Case
>>
>>
>> /`/bin/spark-submit --class <class name> --master yarn-cluster
>> --driver-memory 7g --executor-memory 3g --num-executors 3
>> --executor-cores 1
>> --jars <jar file>`/
>>
>> If I run the program with the same driver memory but higher executor
>> memory,
>> the job runs longer (about 3-4 minutes) than the first case and then it
>> will
>> encounter a different error from earlier which is a Container
>> requesting/using more memory than allowed and is being killed because of
>> that. Although I find it weird since the executor memory is increased and
>> this error occurs instead of the error in the first case.
>>
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24507/tr24f.png
>> >
>>
>> Third Case
>>
>>
>> /`/bin/spark-submit --class <class name> --master yarn-cluster
>> --driver-memory 11g --executor-memory 1g --num-executors 3
>> --executor-cores
>> 1 --jars <jar file>`/
>>
>> Any setting with driver memory greater than 10g will lead to the job being
>> able to run successfully.
>>
>> Fourth Case
>>
>>
>> /`/bin/spark-submit --class <class name> --master yarn-cluster
>> --driver-memory 2g --executor-memory 1g --conf
>> spark.yarn.executor.memoryOverhead=1024 --conf
>> spark.yarn.driver.memoryOverhead=1024 --num-executors 3 --executor-cores 1
>> --jars <jar file>`
>> /
>> The job will run successfully with this setting (driver memory 2g and
>> executor memory 1g but increasing the driver memory overhead(1g) and the
>> executor memory overhead(1g).
>>
>> Questions
>>
>>
>>  1. Why is a different error thrown and the job runs longer (for the
>> second
>> case) between the first and second case with only the executor memory
>> being
>> increased? Are the two errors linked in some way?
>>
>>  2. Both the third and fourth case succeeds and I understand that it is
>> because I am giving more memory which solves the memory problems. However,
>> in the third case,
>>
>> /spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory that
>> YARN will create a JVM
>> = 11g + (driverMemory * 0.07, with minimum of 384m)
>> = 11g + 1.154g
>> = 12.154g/
>>
>> So, from the formula, I can see that my job requires MEMORY_TOTAL of
>> around
>> 12.154g to run successfully which explains why I need more than 10g for
>> the
>> driver memory setting.
>>
>> But for the fourth case,
>>
>> /
>> spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory that
>> YARN will create a JVM
>> = 2 + (driverMemory * 0.07, with minimum of 384m)
>> = 2g + 0.524g
>> = 2.524g
>> /
>>
>> It seems that just by increasing the memory overhead by a small amount of
>> 1024(1g) it leads to the successful run of the job with driver memory of
>> only 2g and the MEMORY_TOTAL is only 2.524g! Whereas without the overhead
>> configuration, driver memory less than 11g fails but it doesn't make sense
>> from the formula which is why I am confused.
>>
>> Why increasing the memory overhead (for both driver and executor) allows
>> my
>> job to complete successfully with a lower MEMORY_TOTAL (12.154g vs
>> 2.524g)?
>> Is there some other internal things at work here that I am missing?
>>
>> I would really appreciate any helped offered as it would really help with
>> my
>> understanding of Spark. Thanks in advance.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Effects-of-Driver-Memory-Executor-Memory-Driver-Memory-Overhead-and-Executor-Memory-Overhead-os-tp24507.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

Reply via email to