Hi Marco,

How are you starting the job? For example, are you using Yarn as the
resource manager? It looks like there is just enough resources in the
cluster to run this job. Assuming the cluster is correctly configured and
Task Managers are able to connect with the Job Manager (can you share full
JM/TM logs?), I would say your job is simply too large (32 parallelism?)
for the given configuration.

Best,
Piotrek

wt., 25 maj 2021 o 06:10 Marco Villalobos <mvillalo...@kineteque.com>
napisaƂ(a):

> I am running with one job manager and three task managers.
>
> Each task manager is receiving at most 8 gb of data, but the job is timing
> out.
>
> What parameters must I adjust?
>
> Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd)
> switched from SCHEDULED to FAILED on [unassigned resource].
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Slot request bulk is not fulfillable! Could not allocate the required slot
> within slot request timeout
>     at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[?:1.8.0_282]
>     at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[?:1.8.0_282]
>     at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> ~[?:1.8.0_282]
>     at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> ~[?:1.8.0_282]
>     at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> ~[?:1.8.0_282]
>     at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> ~[?:1.8.0_282]
>     at
> org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:223)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:168)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_282]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_282]
>     at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:442)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:209)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.actor.Actor.aroundReceive(Actor.scala:517)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.actor.Actor.aroundReceive$(Actor.scala:515)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> [feature-LUM-3882-toledo--850a6747.jar:?]
>     at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Slot request bulk is not fulfillable! Could not allocate the required slot
> within slot request timeout
>     at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
>     ... 26 more
> Caused by: java.util.concurrent.TimeoutException: Timeout has occurred:
> 300000 ms
>     at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0
>

Reply via email to