Re: Heartbeat lost

2014-11-19 Thread Stephan Ewen
well. > > -Original Message- > From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of > Stephan Ewen > Sent: Dienstag, 18. November 2014 14:08 > To: dev@flink.incubator.apache.org > Subject: Re: Heartbeat lost > > The heartbeats currently go through the

RE: Heartbeat lost

2014-11-19 Thread Kruse, Sebastian
en > Sent: Dienstag, 18. November 2014 10:57 > To: dev@flink.incubator.apache.org > Subject: Re: Heartbeat lost > > Yes, that sounds like a good idea. > > I have experienced that occasionally before, under high parallelism > and algorithms where the task manager got lon

Re: Heartbeat lost

2014-11-18 Thread Flavio Pompermaier
Have you evaluated to adopt reactor instead of akka? On Nov 18, 2014 10:57 AM, "Stephan Ewen" wrote: > Yes, that sounds like a good idea. > > I have experienced that occasionally before, under high parallelism and > algorithms where the task manager got long garbage collection stalls... > > The d

Re: Heartbeat lost

2014-11-18 Thread Stephan Ewen
r/jobmanager code, to avoid the > suppression of heartbeats. Or do I miss something? > > Cheers, > Sebastian > > -Original Message- > From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of > Stephan Ewen > Sent: Dienstag, 18. November 2014 10:57 >

RE: Heartbeat lost

2014-11-18 Thread Kruse, Sebastian
Ewen Sent: Dienstag, 18. November 2014 10:57 To: dev@flink.incubator.apache.org Subject: Re: Heartbeat lost Yes, that sounds like a good idea. I have experienced that occasionally before, under high parallelism and algorithms where the task manager got long garbage collection stalls... The default

Re: Heartbeat lost

2014-11-18 Thread Stephan Ewen
Yes, that sounds like a good idea. I have experienced that occasionally before, under high parallelism and algorithms where the task manager got long garbage collection stalls... The default timeout (30 seconds) can be aggressive for sich jobs... Stephan Am 18.11.2014 09:47 schrieb "Kruse, Sebas

Heartbeat lost

2014-11-18 Thread Kruse, Sebastian
Hi everyone, In some of my jobs, I occasionally encounter the problem, that some of the task managers lose the heartbeat connection to the job manager. The jobmanager did not crash, though. Here an excerpt from the dashboard: Error: java.lang.Exception: TaskManager lost heartbeat connection to