Re: NoResourceAvailableException on taskmanager(s)

2021-11-04 Thread Yangze Guo
Hi, Deniz. The exception implies that there are not enough slots in your standalone cluster. You need to increase the `taskmanager.numberOfTaskSlots` or the `numberOfTaskManagers`. You can search the related log "Received resource requirements from job" in jobManager, which indicates how many slot

NoResourceAvailableException on taskmanager(s)

2021-11-04 Thread Deniz Koçak
Hi, We have been running our job on flink image 1.13.2-stream1-scala_2.12-java11. It's a standalone deployment on a Kubernetes cluster (EKS on AWS which uses EC2 nodes as hosts and also depends on a auto-scaler to adjust the resources cluster wide). After a few mins. (5-20) we see the exception be

Re: NoResourceAvailableException

2020-10-28 Thread Khachatryan Roman
localhost:8081/#/task-manager by default). >>>> Before running the program, there should be 1 TM with 1 slot available >>>> which should be free (with default settings). >>>> >>>> If there are other jobs, you can increase slots per TM by increasing >>>

Re: NoResourceAvailableException

2020-10-19 Thread Khachatryan Roman
OfTaskSlots in flink-conf.yaml [1]. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager-numberoftaskslots >> >> Regards, >> Roman >> >> >> On Wed, Oct 14, 2020 at 6:56 PM Alexander Semeshchenko >>

Re: NoResourceAvailableException

2020-10-08 Thread Khachatryan Roman
I assume that before submitting a job you started a cluster with default settings with ./bin/start-cluster.sh. Did you submit any other jobs? Can you share the logs from log folder? Regards, Roman On Wed, Oct 7, 2020 at 11:03 PM Alexander Semeshchenko wrote: > >

NoResourceAvailableException

2020-10-07 Thread Alexander Semeshchenko
Installing (download & tar zxf) Apache Flink 1.11.1 and running: ./bin/flink run examples/streaming/WordCount.jar it show on the nice message after more less 5 min. the trying of submitting: Caused by: org.apache.flink.runtime.jobmanager.schedul

Re: NoResourceAvailableException and JobNotFound Errors

2020-06-02 Thread Zhu Zhu
Hi Prasanna, The job failed because it fails to acquire enough slots to run tasks. Did you launch any task manager? The JobNotFound exception is thrown because someone(possibly Flink UI) sends a query for a job that does not exist in the Flink cluster. >From the log you attached, the job id of yo

NoResourceAvailableException and JobNotFound Errors

2020-06-02 Thread Prasanna kumar
Hi , I am running flink locally in my machine with following configurations. # The RPC port where the JobManager is reachable. jobmanager.rpc.port: 6123 # The heap size for the JobManager JVM jobmanager.heap.size: 1024m # The heap size for the TaskManager JVM taskmanager.heap.size: 1024m

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-18 Thread Till Rohrmann
>>>> I realized for failed containers (that exited for a specific we still >>>> were Requesting new TM container and launching TM). But for the "Detected >>>> unreachable: [akka.tcp://fl...@blahabc.sfdc.net:123]" issue I do not >>>> see the container m

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-06 Thread Till Rohrmann
e to acquire a new > container TM, launch TM and it is reported as started, the > org.apache.flink.runtime.jobmanager.scheduler > throws a NoResourceAvailableException that causes a failure. In our case we > had fixed restart strategy with 5, and we are running out of it because of > this. I am lookin

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-05 Thread Subramanya Suresh
arted, the org.apache.flink.runtime.jobmanager.scheduler throws a NoResourceAvailableException that causes a failure. In our case we had fixed restart strategy with 5, and we are running out of it because of this. I am looking to solve this with a FailureRateRestartStrategy over 2 minutes interval (10 second restart delay, >12 fa

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Subramanya Suresh
t >> of SQL queries we run). The gist is Akka detecs unreachable, TM marked lost >> and unregistered by JM, operators start failing with >> NoResourceAvailableException since there was one less TM, 5 retry attempts >> later job goes down. >> >>

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Till Rohrmann
ueries we run). The gist is Akka detecs unreachable, TM marked lost > and unregistered by JM, operators start failing with > NoResourceAvailableException since there was one less TM, 5 retry attempts > later job goes down. > > . > > 2018-08-29 23:02:41,216 INFO

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Subramanya Suresh
flink.yarn (we have huge logs otherwise, given the amount of SQL queries we run). The gist is Akka detecs unreachable, TM marked lost and unregistered by JM, operators start failing with NoResourceAvailableException since there was one less TM, 5 retry attempts later job goes down. . 2018-08-29 23:02

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-30 Thread Till Rohrmann
ost and then > never re-allocated and subsequently operators fail with > NoResourceAvailableException and after 5 restarts (we have FixedDelay > restarts of 5) the application goes down. > >- We have explicitly set *yarn.reallocate-failed: *true and have not >specified th

Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-29 Thread Subramanya Suresh
Hi, we are seeing a weird issue where one TaskManager is lost and then never re-allocated and subsequently operators fail with NoResourceAvailableException and after 5 restarts (we have FixedDelay restarts of 5) the application goes down. - We have explicitly set *yarn.reallocate-failed: *true

Re: docker, error NoResourceAvailableException..

2018-08-15 Thread Esteban Serrano
nt: *- JOB_MANAGER_RPC_ADDRESS=jobmanager >> >> >> >> This will give you 1 Job Manager and 2 Task Managers with one task slot >> each, so 2 Task slots in general. >> >> 2) You can deploy 1 Job Manager and 1 Task Manager.Then you need to >> modify

Re: docker, error NoResourceAvailableException..

2018-08-15 Thread shyla deshpande
g setting : > > *taskmanager.numberOfTaskSlots: *2 > > > This will give you 2 Task Slots with only 1 Task Manager. But you will > need to somehow override config in the container, possibly using : > https://docs.docker.com/storage/volumes/ > > Regards, > Dominik. >

ODP: docker, error NoResourceAvailableException..

2018-08-15 Thread Dominik Wosiński
Slots: 2 This will give you 2 Task Slots with only 1 Task Manager. But you will need to somehow override config in the container, possibly using : https://docs.docker.com/storage/volumes/ Regards, Dominik. Od: shyla deshpande Wysłano: środa, 15 sierpnia 2018 07:23 Do: user Temat: docker, error N

docker, error NoResourceAvailableException..

2018-08-14 Thread shyla deshpande
Hello all, Trying to use docker as a single node flink cluster. docker run --name flink_local -p 8081:8081 -t flink local I submited a job to the cluster using the Web UI. The job failed. I see this error message in the docker logs. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvaila

Re: Reducing parallelism leads to NoResourceAvailableException

2016-04-28 Thread Ken Krugler
gt; Hi all, >> >> In trying out different settings for performance, I run into a job failure >> case that puzzles me. >> >> I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran >> successfully, on a cluster with 40 slots. >> >> I then

Re: Reducing parallelism leads to NoResourceAvailableException

2016-04-28 Thread Ken Krugler
t; case that puzzles me. > > I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran > successfully, on a cluster with 40 slots. > > I then tried with -p 15, and it failed with: > > NoResourceAvailableException: Not enough free slots available to run the job.

Re: Reducing parallelism leads to NoResourceAvailableException

2016-04-28 Thread Ufuk Celebi
t; successfully, on a cluster with 40 slots. > > I then tried with -p 15, and it failed with: > > NoResourceAvailableException: Not enough free slots available to run the > job. You can decrease the operator parallelism… > > But the change was to reduce parallelism - why would

Re: Reducing parallelism leads to NoResourceAvailableException

2016-04-28 Thread Aljoscha Krettek
t puzzles me. > > I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran > successfully, on a cluster with 40 slots. > > I then tried with -p 15, and it failed with: > > NoResourceAvailableException: Not enough free slots available to run the > job. You c

Reducing parallelism leads to NoResourceAvailableException

2016-04-27 Thread Ken Krugler
: NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism… But the change was to reduce parallelism - why would that now cause this problem? Thanks, — Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data