I think at least you have two different exceptions. > java.lang.Exception: Container released on a *lost* node This usually means a Yarn nodemanager is down. So all the containers running on this node will be released and rescheduled to a new one. If you want to figure out the root cause, you need to check the Yarn nodemanager logs.
> java.lang.OutOfMemoryError: Metaspace Could you check the value of flink configuration "taskmanager.memory.jvm-metaspace.size"? If it is too small, increasing it will help. Usually, 256m is enough for most cases. Best, Yang Vijayendra Yadav <contact....@gmail.com> 于2020年8月25日周二 上午4:51写道: > Another one - > > Exception in thread "FileCache shutdown hook" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "FileCache shutdown hook" > > Regards, > Vijay > > On Mon, Aug 24, 2020 at 1:04 PM Vijayendra Yadav <contact....@gmail.com> > wrote: > >> Actually got this message in rolledover container logs: >> >> [org.slf4j.impl.Log4jLoggerFactory] >> Exception in thread "cb-timer-1-1" java.lang.OutOfMemoryError: Metaspace >> Exception in thread "Thread-16" java.lang.OutOfMemoryError: Metaspace >> Exception in thread "TransientBlobCache shutdown hook" >> java.lang.OutOfMemoryError: Metaspace >> Exception in thread "FileChannelManagerImpl-io shutdown hook" >> java.lang.OutOfMemoryError: Metaspace >> Exception in thread "Kafka Fetcher for Source: flink-kafka-consumer -> Map >> -> Filter -> Map -> Sink: s3-sink-raw (2/3)" java.lang.OutOfMemoryError: >> Metaspace >> Exception in thread "FileCache shutdown hook" java.lang.OutOfMemoryError: >> Metaspace >> >> Any suggestions on how to fix it ? >> >> >> >> On Mon, Aug 24, 2020 at 12:53 PM Vijayendra Yadav <contact....@gmail.com> >> wrote: >> >>> Hi Team, >>> >>> Running a flink job on Yarn, I am trying to make connections to >>> couchbase DB in one of my map functions in Flink Streaming job. But my task >>> manager containers keep failing >>> and keep assigning new containers and not giving me an opportunity to >>> get any useful logs. >>> >>> val cluster = Cluster.connect("host", "user", "pwd") >>> val bucket = cluster.bucket("bucket") >>> val collection = bucket.defaultCollection >>> >>> Only thing I see is yarn exception: >>> >>> java.lang.Exception: Container released on a *lost* node >>> at org.apache.flink.yarn.YarnResourceManager >>> .lambda$onContainersCompleted$0(YarnResourceManager.java:343) >>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync( >>> AkkaRpcActor.java:397) >>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage( >>> AkkaRpcActor.java:190) >>> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor >>> .handleRpcMessage(FencedAkkaRpcActor.java:74) >>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage( >>> AkkaRpcActor.java:152) >>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123 >>> ) >>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala: >>> 21) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala: >>> 170) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala: >>> 171) >>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala: >>> 171) >>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask( >>> ForkJoinPool.java:1339) >>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java: >>> 1979) >>> at akka.dispatch.forkjoin.ForkJoinWorkerThread.run( >>> ForkJoinWorkerThread.java:107) >>> >>> >>> >>> Could you please provide any insight on how to get logs. And why a >>> simple connection will not work. >>> >>> Note: it works in my local system yarn. >>> >>> Regards, >>> Vijay >>> >>