Ok, cool. This seems to be general issues in JVM with very large heaps. I
agree that the best workaround would be to keep the heap size below 32GB.
Thanks guys!

Mingyu

From:  Arun Ahuja <aahuj...@gmail.com>
Date:  Monday, October 6, 2014 at 7:50 AM
To:  Andrew Ash <and...@andrewash.com>
Cc:  Mingyu Kim <m...@palantir.com>, "user@spark.apache.org"
<user@spark.apache.org>, Dennis Lawler <dlaw...@palantir.com>
Subject:  Re: Larger heap leads to perf degradation due to GC

We have used the strategy that you suggested, Andrew - using many workers
per machine and keeping the heaps small (< 20gb).

Using a large heap resulted in workers hanging or not responding (leading to
timeouts).  The same dataset/job for us will fail (most often due to akka
disassociated or fetch failures errors) with 10 cores / 100 executors, 60 gb
per executor while succceed with 1 core / 1000 executors / 6gb per executor.

When the job does succceed with more cores per executor and larger heap it
is usually much slower than the smaller executors (the same 8-10 min job
taking 15-20 min to complete)

The unfortunate downside of this has been, we have had some large broadcast
variables which may not fit into memory (and unnecessarily duplicated) when
using the smaller executors.

Most of this is anecdotal but for the most part we have had more success and
consistency with more executors with smaller memory requirements.

On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash <and...@andrewash.com> wrote:
> Hi Mingyu, 
> 
> Maybe we should be limiting our heaps to 32GB max and running multiple workers
> per machine to avoid large GC issues.
> 
> For a 128GB memory, 32 core machine, this could look like:
> 
> SPARK_WORKER_INSTANCES=4
> SPARK_WORKER_MEMORY=32
> SPARK_WORKER_CORES=8
> 
> Are people running with large (32GB+) executor heaps in production?  I'd be
> curious to hear if so.
> 
> Cheers!
> Andrew
> 
> On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim <m...@palantir.com> wrote:
>> This issue definitely needs more investigation, but I just wanted to quickly
>> check if anyone has run into this problem or has general guidance around it.
>> We¹ve seen a performance degradation with a large heap on a simple map task
>> (I.e. No shuffle). We¹ve seen the slowness starting around from 50GB heap.
>> (I.e. spark.executor.memoty=50g) And, when we checked the CPU usage, there
>> were just a lot of GCs going on.
>> 
>> Has anyone seen a similar problem?
>> 
>> Thanks,
>> Mingyu
> 



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to