@Gyula Do you see log messages about quarantined actor systems? There may be an issue with Akka Death watches that once the connection is lost, it cannot be re-established unless the TaskManager is restarted
http://doc.akka.io/docs/akka/current/scala/remoting.html#Lifecycle_and_Failure_Recovery_Model On Thu, Feb 4, 2016 at 5:03 PM, Radu Tudoran <radu.tudo...@huawei.com> wrote: > Hi, > > > > Well…yesterday when I looked into it there was no additional info than the > one I have send. Today I reproduced the problem and I could see in the log > file. > > > > > > akka.actor.ActorInitializationException: exception during creation > > at akka.actor.ActorInitializationException$.apply(Actor.scala:164) > > at akka.actor.ActorCell.create(ActorCell.scala:596) > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > > at > akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) > > at akka.dispatch.Mailbox.run(Mailbox.scala:220) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > at akka.util.Reflect$.instantiate(Reflect.scala:66) > > at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) > > at akka.actor.Props.newActor(Props.scala:252) > > at akka.actor.ActorCell.newActor(ActorCell.scala:552) > > at akka.actor.ActorCell.create(ActorCell.scala:578) > > ... 10 more > > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > > 11:21:17,423 ERROR > org.apache.flink.runtime.taskmanager.Task > - FATAL - exception in task resource cleanup > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > 11:21:55,160 ERROR > org.apache.flink.runtime.taskmanager.Task > - FATAL - exception in task exception handler > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > > > …. > > > > - Unexpected exception in the selector loop. > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > > > > > Looks like the input flow is faster than the GC collector > > > > Dr. Radu Tudoran > > Research Engineer - Big Data Expert > > IT R&D Division > > > > [image: cid:image007.jpg@01CD52EB.AD060EE0] > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > > European Research Center > > Riesstrasse 25, 80992 München > > > > E-mail: *radu.tudo...@huawei.com <radu.tudo...@huawei.com>* > > Mobile: +49 15209084330 > > Telephone: +49 891588344173 > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com > Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, > Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN > Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN > > This e-mail and its attachments contain confidential information from > HUAWEI, which is intended only for the person or entity whose address is > listed above. Any use of the information contained herein in any way > (including, but not limited to, total or partial disclosure, reproduction, > or dissemination) by persons other than the intended recipient(s) is > prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it! > > > > *From:* Till Rohrmann [mailto:trohrm...@apache.org] > *Sent:* Thursday, February 04, 2016 4:55 PM > *To:* user@flink.apache.org > *Subject:* Re: release of task slot > > > > Hi Radu, > > what does the log of the TaskManager 10.204.62.80:57910 say? > > Cheers, > Till > > > > > > On Wed, Feb 3, 2016 at 6:00 PM, Radu Tudoran <radu.tudo...@huawei.com> > wrote: > > Hello, > > > > > > I am facing an error which for which I cannot figure the cause. Any idea > what could cause such an error? > > > > > > > > java.lang.Exception: The slot in which the task was executed has been > released. Probably loss of TaskManager a8b69bd9449ee6792e869a9ff9e843e2 @ > cloudr6-admin - 4 slots - URL: akka.tcp:// > flink@10.204.62.80:57910/user/taskmanager > > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151) > > at > org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) > > at > org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) > > at > org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156) > > at > org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215) > > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696) > > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > Dr. Radu Tudoran > > Research Engineer - Big Data Expert > > IT R&D Division > > > > [image: cid:image007.jpg@01CD52EB.AD060EE0] > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > > European Research Center > > Riesstrasse 25, 80992 München > > > > E-mail: *radu.tudo...@huawei.com <radu.tudo...@huawei.com>* > > Mobile: +49 15209084330 > > Telephone: +49 891588344173 > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com > Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, > Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN > Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN > > This e-mail and its attachments contain confidential information from > HUAWEI, which is intended only for the person or entity whose address is > listed above. Any use of the information contained herein in any way > (including, but not limited to, total or partial disclosure, reproduction, > or dissemination) by persons other than the intended recipient(s) is > prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it! > > > > >