Hi,

Recently I upgraded the flink-kubernetes-operator from 1.4.0 to 1.6.1 to
use Flink 1.18. After that, the operator kept reporting the following
exception:

2023-11-21 03:26:50,505 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO
> ][sn-push/sn-push-decision-maker-log-s3-hive-prd] Resource fully
> reconciled, nothing to do...
>
> 2023-11-21 03:26:50,727 o.a.f.r.r.RestClient           [WARN
> ][realtime-streaming/realtime-perf-report-main-prd-test] Rest endpoint
> shutdown failed.
>
> java.util.concurrent.TimeoutException
>
> at java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown
> Source)
>
> at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
>
> at org.apache.flink.runtime.rest.RestClient.shutdown(RestClient.java:227)
>
> at
> org.apache.flink.client.program.rest.RestClusterClient.close(RestClusterClient.java:270)
>
> at
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getTaskManagersInfo(AbstractFlinkService.java:925)
>
> at
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getClusterInfo(AbstractFlinkService.java:621)
>
> at
> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeClusterInfo(AbstractFlinkDeploymentObserver.java:85)
>
> at
> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:75)
>
> at
> org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:49)
>
> at
> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:129)
>
> at
> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:56)
>
> at
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:138)
>
> at
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:96)
>
> at
> org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
>
> at
> io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:95)
>
> at
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:139)
>
> at
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:119)
>
> at
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:89)
>
> at
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:62)
>
> at
> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:414)
>
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
> at java.base/java.lang.Thread.run(Unknown Source)
>

I tried to increase the rest timeout param of
"job.autoscaler.flink.rest-client.timeout"
to 60 s, yet it does not resolve the issue.

Could you help check this out ? Thanks in advance.

Reply via email to