Hi Gyula, and thanks for your answer,

We tried without any cluster-id reference and still got the same error message. 
It seems to be related with flink 1.16 as we have other jobs running with the 
same flinkConfig and flink 1.15.

PB

Fra: Gyula Fóra <gyula.f...@gmail.com>
Dato: fredag, 31. mars 2023 kl. 14:41
Til: Pierre Bedoucha <pierre.bedou...@tv2.no>
Kopi: user@flink.apache.org <user@flink.apache.org>
Emne: Re: [Kubernetes Operator] NullPointerException from 
KubernetesApplicationClusterEntrypoint
Never seen this before but also you should not set the cluster-id in your 
config as that should be controlled by the operator itself.

Gyula

On Fri, Mar 31, 2023 at 2:39 PM Pierre Bedoucha 
<pierre.bedou...@tv2.no<mailto:pierre.bedou...@tv2.no>> wrote:
Hi,

We are trying to use Flink Kubernetes Operator 1.4.0 with Flink 1.16.

However, at the job-manager deployment step we get the following error:
```
Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.shutDownAsync(ClusterEntrypoint.java:585)
        at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:242)
        at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:729)
        at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86)

```
It sems it is related to the following line:
```

this.clusterId = 
checkNotNull(flinkConfig.getString(KubernetesConfigOptions.CLUSTER_ID), 
"ClusterId must be specified!");

```
We specified the CLUSTER_ID but it seems that the flinkConfig object is not 
handled correctly.

We have the following flinkConfiguration defined in deployment.yaml:
```
spec:

  flinkConfiguration:

    execution.checkpointing.externalized-checkpoint-retention: 
RETAIN_ON_CANCELLATION

    execution.checkpointing.interval: 120s

    execution.checkpointing.min-pause: 120s

    execution.checkpointing.mode: AT_LEAST_ONCE

    execution.checkpointing.snapshot-compression: "false"

    execution.checkpointing.timeout: 3000s

    execution.checkpointing.tolerable-failed-checkpoints: "5"

    execution.checkpointing.unaligned: "false"

    fs.hdfs.hadoopconf: /opt/hadoop-conf/

    high-availability.storageDir: gs://<path/to/environment>/ha

    high-availability: kubernetes

    high-availability.cluster-id: <cluster-id>

    kubernetes.operator.periodic.savepoint.interval: 6h

    kubernetes.operator.savepoint.history.max.age: 72h

    kubernetes.operator.savepoint.history.max.count: "15"

    metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter

    metrics.reporter.prom.port: "2112"

    metrics.reporters: prom

    rest.flamegraph.enabled: "false"

    state.backend: rocksdb

    state.backend.incremental: "false"

    state.backend.rocksdb.localdir: /rocksdb

    state.checkpoint-storage: filesystem

    state.checkpoints.dir: gs://<path/to/environment>/checkpoints

    state.savepoints.dir: gs://<path/to/environment>/savepoints

    taskmanager.memory.managed.fraction: "0"

    taskmanager.network.memory.buffer-debloat.enabled: "false"

    taskmanager.network.memory.buffer-debloat.period: "200"

    taskmanager.network.memory.buffers-per-channel: "2"

    taskmanager.network.memory.floating-buffers-per-gate: "8"

    taskmanager.network.memory.max-buffers-per-channel: "10"

    taskmanager.network.sort-shuffle.min-buffers: "512"

    taskmanager.numberOfTaskSlots: "1"

    kubernetes.taskmanager.cpu.limit-factor: "4"

    kubernetes.taskmanager.cpu: "0.5"

    kubernetes.cluster-id: <cluster-id>
```
Have someone encountered the issue before?

Thanks,
PB

Reply via email to