Re: computation slows down 10x because of cached RDDs

Matei Zaharia Tue, 11 Mar 2014 13:36:12 -0700

Yeah, System.gc() is a suggestion but in practice it does invoke full GCs on 
the Sun JVM.


Matei

On Mar 11, 2014, at 12:35 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey matei,
> ha i will definitely that one! looks like a total hack... i might just 
> schedule it after the precaching of rdds defensively.
> 
> also trying java 7 with g1
> 
> 
> On Tue, Mar 11, 2014 at 3:17 PM, Matei Zaharia <matei.zaha...@gmail.com> 
> wrote:
> Right, that’s it. I think what happened is the following: all the nodes 
> generated some garbage that put them very close to the threshold for a full 
> GC in the first few runs of the program (when you cached the RDDs), but on 
> the subsequent queries, only a few nodes are hitting full GC per query, so 
> every query sees a slowdown but the problem persists for a whille. You can 
> try manually forcing a GC on the nodes like this after you do your loading:
> 
> sc.parallelize(1 to numNodes, numNodes).foreach(x => System.gc())
> 
> Where numNodes is your number of nodes. (Actually it’s also okay to just make 
> this higher, System.gc() returns fast when there’s no GC to run.)
> 
> Matei
> 
> On Mar 11, 2014, at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:
> 
>> hey matei,
>> most tasks have GC times of 200ms or less, and then a few tasks take many 
>> seconds. example GC activity for a slow one:
>> 
>> [GC [PSYoungGen: 1051814K->262624K(1398144K)] 3789259K->3524429K(5592448K), 
>> 0.0986800 secs] [Times: user=1.53 sys=0.01, real=0.10 secs]
>> [GC [PSYoungGen: 786935K->524512K(1398144K)] 4048741K->4048762K(5592448K), 
>> 0.1132490 secs] [Times: user=1.70 sys=0.01, real=0.11 secs]
>> [Full GC [PSYoungGen: 524512K->0K(1398144K)] [PSOldGen: 
>> 3524250K->2207344K(4194304K)] 4048762K->2207344K(5592448K) [PSPermGen: 
>> 56545K->54639K(83968K)], 7.7059350 secs] [Times:\
>>  user=7.71 sys=0.00, real=7.70 secs]
>> 
>> 
>> so looks like i am hit by stop-the-world gc?
>> 
>> 
>> On Mon, Mar 10, 2014 at 7:00 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> hey matei,
>> it happens repeatedly.
>> 
>> we are currently runnning on java 6 with spark 0.9.
>> 
>> i will add -XX:+PrintGCDetails and collect details, and also look into java 
>> 7 G1. thanks
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <matei.zaha...@gmail.com> 
>> wrote:
>> Does this happen repeatedly if you keep running the computation, or just the 
>> first time? It may take time to move these Java objects to the old 
>> generation the first time you run queries, which could lead to a GC pause 
>> that also slows down the small queries.
>> 
>> If you can run with -XX:+PrintGCDetails in your Java options, it would also 
>> be good to see what percent of each GC generation is used.
>> 
>> The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in 
>> Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently 
>> with your application threads.
>> 
>> Matei
>> 
>> On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> 
>>> hello all,
>>> i am observing a strange result. i have a computation that i run on a 
>>> cached RDD in spark-standalone. it typically takes about 4 seconds. 
>>> 
>>> but when other RDDs that are not relevant to the computation at hand are 
>>> cached in memory (in same spark context), the computation takes 40 seconds 
>>> or more.
>>> 
>>> the problem seems to be GC time, which goes from milliseconds to tens of 
>>> seconds.
>>> 
>>> note that my issue is not that memory is full. i have cached about 14G in 
>>> RDDs with 66G available across workers for the application. also my 
>>> computation did not push any cached RDD out of memory.
>>> 
>>> any ideas?
>> 
>> 
>> 
> 
>

Re: computation slows down 10x because of cached RDDs

Reply via email to