hey matei, it happens repeatedly. we are currently runnning on java 6 with spark 0.9.
i will add -XX:+PrintGCDetails and collect details, and also look into java 7 G1. thanks On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia <matei.zaha...@gmail.com>wrote: > Does this happen repeatedly if you keep running the computation, or just > the first time? It may take time to move these Java objects to the old > generation the first time you run queries, which could lead to a GC pause > that also slows down the small queries. > > If you can run with -XX:+PrintGCDetails in your Java options, it would > also be good to see what percent of each GC generation is used. > > The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in > Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently > with your application threads. > > Matei > > On Mar 10, 2014, at 3:18 PM, Koert Kuipers <ko...@tresata.com> wrote: > > hello all, > i am observing a strange result. i have a computation that i run on a > cached RDD in spark-standalone. it typically takes about 4 seconds. > > but when other RDDs that are not relevant to the computation at hand are > cached in memory (in same spark context), the computation takes 40 seconds > or more. > > the problem seems to be GC time, which goes from milliseconds to tens of > seconds. > > note that my issue is not that memory is full. i have cached about 14G in > RDDs with 66G available across workers for the application. also my > computation did not push any cached RDD out of memory. > > any ideas? > > >