Hi Ximo. Regarding to #1 you can try to increase the number of partitions used for cogroup or reduce. AFAIK Spark needs to have enough memory space to handle in memory all the data processed by a given partition, increasing the number of partitions you can reduce that load. Probably we need to know more about your workflow in order to assess if that is your case.
Nacho El 16 feb. 2016 4:58 p. m., "JOAQUIN GUANTER GONZALBEZ" < joaquin.guantergonzal...@telefonica.com> escribió: > Thanks. I'll take a look at Graphite to see if that helps me out with my > first problem. > > Ximo. > > -----Mensaje original----- > De: Arkadiusz Bicz [mailto:arkadiusz.b...@gmail.com] > Enviado el: martes, 16 de febrero de 2016 16:06 > Para: Iulian Dragoș <iulian.dra...@typesafe.com> > CC: JOAQUIN GUANTER GONZALBEZ <joaquin.guantergonzal...@telefonica.com>; > user@spark.apache.org > Asunto: Re: Memory problems and missing heartbeats > > I had similar as #2 problem when I used lot of caching and then doing > shuffling It looks like when I cached too much there was no enough space > for other spark tasks and it just hang on. > > That you can try to cache less and see if improve, also executor logs help > a lot (watch out logs with information about spill) you can also monitor > jobs jvms through spark monitoring > http://spark.apache.org/docs/latest/monitoring.html and Graphite and > Grafana. > > On Tue, Feb 16, 2016 at 2:14 PM, Iulian Dragoș <iulian.dra...@typesafe.com> > wrote: > > Regarding your 2nd problem, my best guess is that you’re seeing GC > pauses. > > It’s not unusual, given you’re using 40GB heaps. See for instance this > > blog post > > > > From conducting numerous tests, we have concluded that unless you are > > utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage > > Collector provided with JDK will render any kind of stable GC > > performance with heap sizes larger that 16GB. For example, on 50GB > > heaps we can often encounter up to 5 minute GC pauses, with average > pauses of 2 to 4 seconds. > > > > Not sure if Yarn can do this, but I would try to run with a smaller > > executor heap, and more executors per node. > > > > iulian > > > > > > ________________________________ > > Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, > puede contener información privilegiada o confidencial y es para uso > exclusivo de la persona o entidad de destino. Si no es usted. el > destinatario indicado, queda notificado de que la lectura, utilización, > divulgación y/o copia sin autorización puede estar prohibida en virtud de > la legislación vigente. Si ha recibido este mensaje por error, le rogamos > que nos lo comunique inmediatamente por esta misma vía y proceda a su > destrucción. > > The information contained in this transmission is privileged and > confidential information intended only for the use of the individual or > entity named above. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. If you have received > this transmission in error, do not read it. Please immediately reply to the > sender that you have received this communication in error and then delete > it. > > Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, > pode conter informação privilegiada ou confidencial e é para uso exclusivo > da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário > indicado, fica notificado de que a leitura, utilização, divulgação e/ou > cópia sem autorização pode estar proibida em virtude da legislação vigente. > Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique > imediatamente por esta mesma via e proceda a sua destruição > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >