Hi Amit, We recently fixed a bug in the network stack that affected batch jobs (FLINK-9144). The fix was added after your commit.
Do you have a chance to build the current release-1.5 branch and check if the fix also resolves your problem? Otherwise it would be great if you could open a blocker issue for the 1.5 release to ensure that this is fixed. Thanks, Fabian 2018-04-29 18:30 GMT+02:00 Amit Jain <aj201...@gmail.com>: > Cluster is running on commit 2af481a > > On Sun, Apr 29, 2018 at 9:59 PM, Amit Jain <aj201...@gmail.com> wrote: > > Hi, > > > > We are running numbers of batch jobs in Flink 1.5 cluster and few of > those > > are getting stuck at random. These jobs having the following failure > after > > which operator status changes to CANCELED and stuck to same. > > > > Please find complete TM's log at > > https://gist.github.com/imamitjain/066d0e99990ee24f2da1ddc83eba2012 > > > > > > 2018-04-29 14:57:24,437 INFO org.apache.flink.runtime.taskmanager.Task > > - Producer 98f5976716234236dc69fb0e82a0cc34 of partition > > 5b8dd5ac2df14266080a8e049defbe38 disposed. Cancelling execution. > > org.apache.flink.runtime.jobmanager.PartitionProducerDisposedException: > > Execution 98f5976716234236dc69fb0e82a0cc34 producing partition > > 5b8dd5ac2df14266080a8e049defbe38 has already been disposed. > > at > > org.apache.flink.runtime.jobmaster.JobMaster.requestPartitionState( > JobMaster.java:610) > > at sun.reflect.GeneratedMethodAccessor107.invoke(Unknown Source) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation( > AkkaRpcActor.java:210) > > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor. > handleMessage(AkkaRpcActor.java:154) > > at > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage( > FencedAkkaRpcActor.java:69) > > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$ > onReceive$1(AkkaRpcActor.java:132) > > at akka.actor.ActorCell$$anonfun$become$1.applyOrElse( > ActorCell.scala:544) > > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. > runTask(ForkJoinPool.java:1339) > > at scala.concurrent.forkjoin.ForkJoinPool.runWorker( > ForkJoinPool.java:1979) > > at > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run( > ForkJoinWorkerThread.java:107) > > > > > > Thanks > > Amit >