Hi, thanks for the reply.
These errors occur on jobs that have already been successfully deployed and are 
running.

When such an error occurs, the operator begins to consider that the job is in 
the DEPLOYING or DEPLOYED_NOT_READY status, but all this time the job is in the 
RUNNING state and no actions are performed with it

It seems that this problem appeared after updating the FlinkDeployment resource 
to update the version of the running job


2023-06-08 06:31:02,741 o.a.f.k.o.o.JobStatusObserver  [WARN 
][job-name/job-name] Exception while listing jobs
2023-06-08 06:31:02,741 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: READY
2023-06-08 06:31:03,758 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager is being deployed
2023-06-08 06:31:03,824 o.a.f.k.o.l.AuditUtils         [INFO 
][job-name/job-name] >>> Status | Info    | STABLE          | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-08 06:31:03,825 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:03,825 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:03,825 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:31:13,828 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:31:13,829 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:31:13,829 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: DEPLOYING
2023-06-08 06:31:14,849 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager is being deployed
2023-06-08 06:31:14,850 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:14,850 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:14,850 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:31:24,853 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:31:24,854 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:31:24,854 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: DEPLOYING
2023-06-08 06:31:24,858 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager deployment port is ready, waiting for the Flink 
REST API...
2023-06-08 06:31:24,926 o.a.f.k.o.l.AuditUtils         [INFO 
][job-name/job-name] >>> Status | Info    | STABLE          | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-08 06:31:24,927 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:24,927 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:24,927 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:31:34,930 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:31:34,931 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:31:34,931 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: 
DEPLOYED_NOT_READY
2023-06-08 06:31:34,931 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager deployment is ready
2023-06-08 06:31:34,931 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Observing job status
2023-06-08 06:31:34,936 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Job status changed from RECONCILING to RUNNING
2023-06-08 06:31:34,960 o.a.f.k.o.l.AuditUtils         [INFO 
][job-name/job-name] >>> Event  | Info    | JOBSTATUSCHANGED | Job status 
changed from RECONCILING to RUNNING
2023-06-08 06:31:35,031 o.a.f.k.o.l.AuditUtils         [INFO 
][job-name/job-name] >>> Status | Info    | STABLE          | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-08 06:31:35,032 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:35,032 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:35,032 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:32:35,035 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:32:35,035 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:32:35,036 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Observing job status
2023-06-08 06:32:35,044 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Job status (RUNNING) unchanged
2023-06-08 06:32:35,049 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:32:35,049 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...

kubernetes configuration of flink:
kubernetes.cluster-id: job-name
kubernetes.container.image.pull-policy: Always
kubernetes.container.image: flink:1.14.4-java11
kubernetes.internal.jobmanager.entrypoint.class: 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
kubernetes.jobmanager.annotations: flinkdeployment.flink.apache.org/generation:3
kubernetes.jobmanager.cpu: 4.0
kubernetes.jobmanager.labels: job_version:0.3.0
kubernetes.jobmanager.memory.limit-factor: 1.3
kubernetes.jobmanager.owner.reference: 
blockOwnerDeletion:false,controller:false,name:job-name,uid:b118a60b-80a2-43f9-933c-d1510e63bf6c,kind:FlinkDeployment,apiVersion:flink.apache.org/v1beta1
kubernetes.jobmanager.replicas: 1
kubernetes.namespace: job-name
kubernetes.pod-template-file.jobmanager: 
/tmp/flink_op_generated_podTemplate_8388768779635722075.yaml
kubernetes.pod-template-file.taskmanager: 
/tmp/flink_op_generated_podTemplate_8986511200228142287.yaml
kubernetes.pod-template-file: 
/tmp/flink_op_generated_podTemplate_11143683886521703748.yaml
kubernetes.rest-service.exposed.type: Headless_ClusterIP
kubernetes.service-account: flink
kubernetes.taskmanager.cpu: 12.0
kubernetes.taskmanager.labels: job_version:0.3.0
kubernetes.taskmanager.memory.limit-factor: 1.1

this is what it looks like in metrics
[cid:393756c0-8266-41a2-8d82-eb9ec46e90a3]

________________________________
От: Shammon FY <zjur...@gmail.com>
Отправлено: 8 июня 2023 г. 12:55:38
Кому: Evgeniy Lyutikov
Копия: user@flink.apache.org
Тема: Re: Kubernetes operator listing jobs TimeoutException

Hi Evgeniy,

From the following exception message:

        at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:123)
        at 
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:469)
        at 
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:392)
        at 
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:306)
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$37(RestClusterClient.java:931)
        at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)

It seems that the client tried to submit a job to the flink cluster through the 
rest api failed, maybe you need to provide more information such as config of 
k8s for the job and community can help better analyze problems.


Best,
Shammon FY

On Wed, Jun 7, 2023 at 11:35 PM Evgeniy Lyutikov 
<eblyuti...@avito.ru<mailto:eblyuti...@avito.ru>> wrote:

Hello.
We use Kubernetes operator 1.4.0, operator serves about 50 jobs, but sometimes 
there are errors in the logs that are reflected in the metrics 
(FlinkDeployment.JmDeploymentStatus.READY.Count). What is the reason for such 
errors?


2023-06-07 15:28:27,601 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-07 15:28:27,602 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-07 15:28:27,602 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Observing job status
2023-06-07 15:28:39,623 o.a.f.s.n.i.n.c.AbstractChannel [WARN ] Force-closing a 
channel whose registration task was not accepted by an event loop: [id: 
0xd494f516]
java.util.concurrent.RejectedExecutionException: event executor terminated
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:815)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:483)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
        at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
        at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
        at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:139)
        at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:123)
        at 
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:469)
        at 
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:392)
        at 
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:306)
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$37(RestClusterClient.java:931)
        at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
        at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649)
        at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-07 15:28:39,624 o.a.f.s.n.i.n.u.c.D.rejectedExecution [ERROR] Failed to 
submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:815)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:499)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
        at 
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:473)
        at 
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:392)
        at 
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:306)
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$37(RestClusterClient.java:931)
        at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
        at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649)
        at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-07 15:28:39,624 o.a.f.k.o.o.JobStatusObserver  [WARN 
][job-name/job-name] Exception while listing jobs
java.util.concurrent.TimeoutException
        at 
java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
        at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
        at 
org.apache.flink.kubernetes.operator.service.AbstractFlinkService.listJobs(AbstractFlinkService.java:241)
        at 
org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observe(JobStatusObserver.java:70)
        at 
org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver.observeFlinkCluster(ApplicationObserver.java:58)
        at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:73)
        at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:53)
        at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:120)
        at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:56)
        at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:145)
        at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:103)
        at 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
        at 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:102)
        at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:139)
        at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:119)
        at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:89)
        at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:62)
        at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-07 15:28:39,624 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: READY
2023-06-07 15:28:39,652 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager is being deployed
2023-06-07 15:28:39,723 o.a.f.k.o.l.AuditUtils         [INFO 
][job-name/job-name] >>> Status | Info    | STABLE          | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-07 15:28:39,724 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-07 15:28:39,724 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-07 15:28:39,724 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation



________________________________
“This message contains confidential information/commercial secret. If you are 
not the intended addressee of this message you may not copy, save, print or 
forward it to any third party and you are kindly requested to destroy this 
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся 
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного 
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его 
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом 
отправителя электронным письмом.”

Reply via email to