Hi Congxian 我这里改用了G1收集器,而且我在taskmanager上看到fgc的次数也才几次,最多十几次,没有特别严重的gc。并且,也把状态后端放到了rockDB上了。 但是,跑了不到一小时,也是报异常,在taskmanager的日志上看到的日志为: org.apache.flink.util.FlinkException: The assigned slot container_e23_1545597259276_0490_01_000006_0 was removed. at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:893) at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:863) at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1058) at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:385) at org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:825) at org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener$1.run(ResourceManager.java:1139) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 我分配的taskmanager 资源为:一个taskmanager 8个G,8个slot 这里是否为资源不足导致的呢?还是有其他原因?
Best, JiaCheng a773807...@gmail.com 发件人: Congxian Qiu 发送时间: 2019-02-14 14:38 收件人: a773807943 抄送: user-zh 主题: Re: flink使用异常超时 Hi 可以考虑是否 taskmanager 的 GC 比较严重 Best, Congxian cousin-gmail <a773807...@gmail.com> 于2019年2月14日周四 下午2:34写道: 嘿,我这里使用flink on yarn中,经常报出异常,然后flink就自己关闭了。 里面具体的逻辑是从kafka中接收数据,然后按照enentTime中的window滑动窗口滑动, 窗口大小为1小时,滑动间隔是5秒。聚集数据后,就写到redis中。 一般运行了2个小时候,就报异常,然后就结束了任务。其中,jobmanager的日志中显 示为: java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id container_e23_1545597259276_0273_01_001220 timed out. at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.no tifyHeartbeatTimeout(JobMaster.java:1624) at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor. run(HeartbeatManagerImpl.java:339) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter $ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDis patcher.scala:415) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1 339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java :107) 这里面显示taskmanager超时了,但是在taskmanager对应的日志中,是没有具体的异常 的。请问这个是什么原因导致的呢?