I think at least you have two different exceptions.

> java.lang.Exception: Container released on a *lost* node
This usually means a Yarn nodemanager is down. So all the containers
running on this node will be
released and rescheduled to a new one. If you want to figure out the root
cause, you need to check
the Yarn nodemanager logs.

> java.lang.OutOfMemoryError: Metaspace
Could you check the value of flink configuration
"taskmanager.memory.jvm-metaspace.size"? If it is
too small, increasing it will help. Usually, 256m is enough for most cases.


Best,
Yang

Vijayendra Yadav <contact....@gmail.com> 于2020年8月25日周二 上午4:51写道:

> Another one -
>
> Exception in thread "FileCache shutdown hook"
> Exception: java.lang.OutOfMemoryError thrown from the
> UncaughtExceptionHandler in thread "FileCache shutdown hook"
>
> Regards,
> Vijay
>
> On Mon, Aug 24, 2020 at 1:04 PM Vijayendra Yadav <contact....@gmail.com>
> wrote:
>
>> Actually got this message in rolledover container logs:
>>
>> [org.slf4j.impl.Log4jLoggerFactory]
>> Exception in thread "cb-timer-1-1" java.lang.OutOfMemoryError: Metaspace
>> Exception in thread "Thread-16" java.lang.OutOfMemoryError: Metaspace
>> Exception in thread "TransientBlobCache shutdown hook" 
>> java.lang.OutOfMemoryError: Metaspace
>> Exception in thread "FileChannelManagerImpl-io shutdown hook" 
>> java.lang.OutOfMemoryError: Metaspace
>> Exception in thread "Kafka Fetcher for Source: flink-kafka-consumer -> Map 
>> -> Filter -> Map -> Sink: s3-sink-raw (2/3)" java.lang.OutOfMemoryError: 
>> Metaspace
>> Exception in thread "FileCache shutdown hook" java.lang.OutOfMemoryError: 
>> Metaspace
>>
>> Any suggestions on how to fix it ?
>>
>>
>>
>> On Mon, Aug 24, 2020 at 12:53 PM Vijayendra Yadav <contact....@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> Running a flink job on Yarn, I am trying to make connections to
>>> couchbase DB in one of my map functions in Flink Streaming job. But my task
>>> manager containers keep failing
>>> and keep assigning new containers and not giving me an opportunity to
>>> get any useful logs.
>>>
>>>  val cluster = Cluster.connect("host", "user", "pwd")
>>>  val bucket = cluster.bucket("bucket")
>>>  val collection = bucket.defaultCollection
>>>
>>> Only thing I see is yarn exception:
>>>
>>> java.lang.Exception: Container released on a *lost* node
>>>     at org.apache.flink.yarn.YarnResourceManager
>>> .lambda$onContainersCompleted$0(YarnResourceManager.java:343)
>>>     at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
>>> AkkaRpcActor.java:397)
>>>     at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
>>> AkkaRpcActor.java:190)
>>>     at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor
>>> .handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>     at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(
>>> AkkaRpcActor.java:152)
>>>     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>>     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>>     at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123
>>> )
>>>     at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:
>>> 21)
>>>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:
>>> 170)
>>>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:
>>> 171)
>>>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:
>>> 171)
>>>     at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>>     at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>>     at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>>     at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>>     at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>     at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(
>>> ForkJoinPool.java:1339)
>>>     at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:
>>> 1979)
>>>     at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(
>>> ForkJoinWorkerThread.java:107)
>>>
>>>
>>>
>>> Could you please provide any insight on how to get logs. And why a
>>> simple connection will not work.
>>>
>>> Note: it works in my local system yarn.
>>>
>>> Regards,
>>> Vijay
>>>
>>

Reply via email to