I had similar as #2 problem when I used lot of caching and then doing shuffling It looks like when I cached too much there was no enough space for other spark tasks and it just hang on.
That you can try to cache less and see if improve, also executor logs help a lot (watch out logs with information about spill) you can also monitor jobs jvms through spark monitoring http://spark.apache.org/docs/latest/monitoring.html and Graphite and Grafana. On Tue, Feb 16, 2016 at 2:14 PM, Iulian Dragoș <iulian.dra...@typesafe.com> wrote: > Regarding your 2nd problem, my best guess is that you’re seeing GC pauses. > It’s not unusual, given you’re using 40GB heaps. See for instance this blog > post > > From conducting numerous tests, we have concluded that unless you are > utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage > Collector provided with JDK will render any kind of stable GC performance > with heap sizes larger that 16GB. For example, on 50GB heaps we can often > encounter up to 5 minute GC pauses, with average pauses of 2 to 4 seconds. > > Not sure if Yarn can do this, but I would try to run with a smaller executor > heap, and more executors per node. > > iulian > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org