I think you need to increase the memory size of executor through command
arguments "--executor-memory", or configuration "spark.executor.memory".

Also yarn.scheduler.maximum-allocation-mb in Yarn side if necessary.

Thanks
Saisai


On Mon, Sep 21, 2015 at 5:13 PM, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> I noticed that some executors have issue with scratch space.
> I see the following in yarn app container stderr around the time when yarn
> killed the executor because it uses too much memory.
>
> -- App container stderr --
> 15/09/21 21:43:22 WARN storage.MemoryStore: Not enough space to cache
> rdd_6_346 in memory! (computed 3.0 GB so far)
> 15/09/21 21:43:22 INFO storage.MemoryStore: Memory use = 477.6 KB (blocks)
> + 24.4 GB (scratch space shared across 8 thread(s)) = 24.4 GB. Storage
> limit = 25.2 GB.
> 15/09/21 21:43:22 WARN spark.CacheManager: Persisting partition rdd_6_346
> to disk instead.
> 15/09/21 21:43:22 WARN storage.MemoryStore: Not enough space to cache
> rdd_6_49 in memory! (computed 3.1 GB so far)
> 15/09/21 21:43:22 INFO storage.MemoryStore: Memory use = 477.6 KB (blocks)
> + 24.4 GB (scratch space shared across 8 thread(s)) = 24.4 GB. Storage
> limit = 25.2 GB.
> 15/09/21 21:43:22 WARN spark.CacheManager: Persisting partition rdd_6_49
> to disk instead.
>
> -- Yarn Nodemanager log --
> 2015-09-21 21:44:05,716 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> (Container Monitor): Container
> [pid=5114,containerID=container_1442869100946_0001_01_0
> 00056] is running beyond physical memory limits. Current usage: 52.2 GB of
> 52 GB physical memory used; 53.0 GB of 260 GB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1442869100946_0001_01_000056 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 5117 5114 5114 5114 (java) 1322810 27563 56916316160 13682772
> /usr/lib/jvm/java-openjdk/bin/java -server -XX:OnOutOfMemoryError=kill %p
> -Xms47924m -Xmx47924m -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70
> -XX:+CMSClassUnloadingEnabled
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1442869100946_0001/container_1442869100946_0001_01_000056/tmp
> -Dspark.akka.failure-detector.threshold=3000.0
> -Dspark.akka.heartbeat.interval=10000s -Dspark.akka.threads=4
> -Dspark.history.ui.port=18080 -Dspark.akka.heartbeat.pauses=60000s
> -Dspark.akka.timeout=1000s -Dspark.akka.frameSize=50
> -Dspark.driver.port=52690
> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1442869100946_0001/container_1442869100946_0001_01_000056
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.0.24.153:52690/user/CoarseGrainedScheduler
> --executor-id 55 --hostname ip-10-0-28-96.ec2.internal --cores 8 --app-id
> application_1442869100946_0001 --user-class-path
> file:/mnt/yarn/usercache/hadoop/appcache/application_1442869100946_0001/container_1442869100946_0001_01_000056/__app__.jar
>         |- 5114 5112 5114 5114 (bash) 0 0 9658368 291 /bin/bash -c
> /usr/lib/jvm/java-openjdk/bin/java -server -XX:OnOutOfMemoryError='kill %p'
> -Xms47924m -Xmx47924m '-verbose:gc' '-XX:+PrintGCDetails'
> '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70'
> '-XX:+CMSClassUnloadingEnabled'
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1442869100946_0001/container_1442869100946_0001_01_000056/tmp
> '-Dspark.akka.failure-detector.threshold=3000.0'
> '-Dspark.akka.heartbeat.interval=10000s' '-Dspark.akka.threads=4'
> '-Dspark.history.ui.port=18080' '-Dspark.akka.heartbeat.pauses=60000s'
> '-Dspark.akka.timeout=1000s' '-Dspark.akka.frameSize=50'
> '-Dspark.driver.port=52690'
> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1442869100946_0001/container_1442869100946_0001_01_000056
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.0.24.153:52690/user/CoarseGrainedScheduler
> --executor-id 55 --hostname ip-10-0-28-96.ec2.internal --cores 8 --app-id
> application_1442869100946_0001 --user-class-path
> file:/mnt/yarn/usercache/hadoop/appcache/application_1442869100946_0001/container_1442869100946_0001_01_000056/__app__.jar
> 1>
> /var/log/hadoop-yarn/containers/application_1442869100946_0001/container_1442869100946_0001_01_000056/stdout
> 2>
> /var/log/hadoop-yarn/containers/application_1442869100946_0001/container_1442869100946_0001_01_000056/stderr
>
>
>
> Is it possible to get what scratch space is used for?
>
> What spark setting should I try to adjust to solve the issue?
>
> On Thu, Sep 10, 2015 at 2:52 PM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
>
>> YARN will never kill processes for being unresponsive.
>>
>> It may kill processes for occupying more memory than it allows.  To get
>> around this, you can either bump spark.yarn.executor.memoryOverhead or turn
>> off the memory checks entirely with yarn.nodemanager.pmem-check-enabled.
>>
>> -Sandy
>>
>> On Tue, Sep 8, 2015 at 10:48 PM, Alexander Pivovarov <
>> apivova...@gmail.com> wrote:
>>
>>> The problem which we have now is skew data (2360 tasks done in 5 min, 3
>>> tasks in 40 min and 1 task in 2 hours)
>>>
>>> Some people from the team worry that the executor which runs the longest
>>> task can be killed by YARN (because executor might be unresponsive because
>>> of GC or it might occupy more memory than Yarn allows)
>>>
>>>
>>>
>>> On Tue, Sep 8, 2015 at 3:02 PM, Sandy Ryza <sandy.r...@cloudera.com>
>>> wrote:
>>>
>>>> Those settings seem reasonable to me.
>>>>
>>>> Are you observing performance that's worse than you would expect?
>>>>
>>>> -Sandy
>>>>
>>>> On Mon, Sep 7, 2015 at 11:22 AM, Alexander Pivovarov <
>>>> apivova...@gmail.com> wrote:
>>>>
>>>>> Hi Sandy
>>>>>
>>>>> Thank you for your reply
>>>>> Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB)
>>>>> with emr setting for Spark "maximizeResourceAllocation": "true"
>>>>>
>>>>> It is automatically converted to Spark settings
>>>>> spark.executor.memory            47924M
>>>>> spark.yarn.executor.memoryOverhead 5324
>>>>>
>>>>> we also set spark.default.parallelism = slave_count * 16
>>>>>
>>>>> Does it look good for you? (we run single heavy job on cluster)
>>>>>
>>>>> Alex
>>>>>
>>>>> On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.r...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Alex,
>>>>>>
>>>>>> If they're both configured correctly, there's no reason that Spark
>>>>>> Standalone should provide performance or memory improvement over Spark on
>>>>>> YARN.
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>> On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <
>>>>>> apivova...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Everyone
>>>>>>>
>>>>>>> We are trying the latest aws emr-4.0.0 and Spark and my question is
>>>>>>> about YARN vs Standalone mode.
>>>>>>> Our usecase is
>>>>>>> - start 100-150 nodes cluster every week,
>>>>>>> - run one heavy spark job (5-6 hours)
>>>>>>> - save data to s3
>>>>>>> - stop cluster
>>>>>>>
>>>>>>> Officially aws emr-4.0.0 comes with Spark on Yarn
>>>>>>> It's probably possible to hack emr by creating bootstrap script
>>>>>>> which stops yarn and starts master and slaves on each computer  (to 
>>>>>>> start
>>>>>>> Spark in standalone mode)
>>>>>>>
>>>>>>> My questions are
>>>>>>> - Does Spark standalone provides significant performance / memory
>>>>>>> improvement in comparison to YARN mode?
>>>>>>> - Does it worth hacking official emr Spark on Yarn and switch Spark
>>>>>>> to Standalone mode?
>>>>>>>
>>>>>>>
>>>>>>> I already created comparison table and want you to check if my
>>>>>>> understanding is correct
>>>>>>>
>>>>>>> Lets say r3.2xlarge computer has 52GB ram available for Spark
>>>>>>> Executor JVMs
>>>>>>>
>>>>>>>                     standalone to yarn comparison
>>>>>>>
>>>>>>>
>>>>>>>                 STDLN   YARN
>>>>>>>
>>>>>>> can executor allocate up to 52GB ram                           - yes
>>>>>>>  |  yes
>>>>>>>
>>>>>>> will executor be unresponsive after using all 52GB ram because of GC
>>>>>>> - yes  |  yes
>>>>>>>
>>>>>>> additional JVMs on slave except of spark executor        - workr |
>>>>>>> node mngr
>>>>>>>
>>>>>>> are additional JVMs lightweight
>>>>>>> - yes  |  yes
>>>>>>>
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to