Thanks for the suggestions.
I'm experimenting with different values for spark memoryOverhead and
explictly giving the executors more memory, but still have not found the
golden medium to get it to finish in a proper time frame.

Is my cluster massively undersized at 5 boxes, 8gb 2cpu ?
Trying to figure out a memory setting and executor setting so it runs on
many containers in parallel.

I'm still struggling as pig jobs and hive jobs on the same whole data set
don't take as long. I'm wondering too if the logic in our code is just
doing something silly causing multiple reads of all the data.


On Fri, Feb 20, 2015 at 9:45 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> If that's the error you're hitting, the fix is to boost
> spark.yarn.executor.memoryOverhead, which will put some extra room in
> between the executor heap sizes and the amount of memory requested for them
> from YARN.
>
> -Sandy
>
> On Fri, Feb 20, 2015 at 9:40 AM, lbierman <leebier...@gmail.com> wrote:
>
>> A bit more context on this issue. From the container logs on the executor
>>
>> Given my cluster specs above what would be appropriate parameters to pass
>> into :
>> --num-executors --num-cores --executor-memory
>>
>> I had tried it with --executor-memory 2500MB
>>
>> 015-02-20 06:50:09,056 WARN
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>> Container [pid=23320,containerID=container_1423083596644_0238_01_004160]
>> is
>> running beyond physical memory limits. Current usage: 2.8 GB of 2.7 GB
>> physical memory used; 4.4 GB of 5.8 GB virtual memory used. Killing
>> container.
>> Dump of the process-tree for container_1423083596644_0238_01_004160 :
>>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>         |- 23320 23318 23320 23320 (bash) 0 0 108650496 305 /bin/bash -c
>> /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError='kill %p'
>> -Xms2400m
>> -Xmx2400m
>>
>> -Djava.io.tmpdir=/dfs/yarn/nm/usercache/root/appcache/application_1423083596644_0238/container_1423083596644_0238_01_004160/tmp
>>
>> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160
>> org.apache.spark.executor.CoarseGrainedExecutorBackend
>> akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal
>> :42535/user/CoarseGrainedScheduler
>> 8 ip-10-99-162-56.ec2.internal 1 application_1423083596644_0238 1>
>>
>> /var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160/stdout
>> 2>
>>
>> /var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160/stderr
>>         |- 23323 23320 23320 23320 (java) 922271 12263 4612222976 724218
>> /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms2400m
>> -Xmx2400m
>>
>> -Djava.io.tmpdir=/dfs/yarn/nm/usercache/root/appcache/application_1423083596644_0238/container_1423083596644_0238_01_004160/tmp
>>
>> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160
>> org.apache.spark.executor.CoarseGrainedExecutorBackend
>> akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to