After upgrading the Flink Kubernetes Operator from v1.11 to v1.12 upgrades
started to fail in all my jobs with the following error message:

```
Error during event processing ExecutionScope{ resource id:
ResourceID{name='my-job-checkpoint-periodic-1741010907590',
namespace='platform'}, version: 2446801878}
```

The upgrade was failing in a very weird way:
- First a savepoint was taken and uploaded to S3
- After some time that savepoint was finally removed from S3 but not from
the cluster CR
- Making the upgrade fail because the savepoint could not be found

Can this be related to this change from here?
-
https://flink.apache.org/2025/06/03/apache-flink-kubernetes-operator-1.12.0-release-announcement/#bug-fixes-and-stability-enhancements

*Savepoint Information Update*: Fixed a bug where upgrade savepoints were
not added to the deprecated savepointInfo, ensuring accurate tracking of
savepoints during upgrades.

In case it helps, here you are the complete stack trace:

```json
{
  "threadId": 352,
  "loggerFqcn": "org.apache.logging.slf4j.Log4jLogger",
  "level": "ERROR",
  "thrown": {
    "extendedStackTrace": [
      {
        "file": "Controller.java",
        "method": "cleanup",
        "line": 212,
        "exact": false,
        "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
        "class": "io.javaoperatorsdk.operator.processing.Controller",
        "version": "1.12.0"
      },
      {
        "file": "ReconciliationDispatcher.java",
        "method": "handleCleanup",
        "line": 291,
        "exact": false,
        "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
        "class":
"io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher",
        "version": "1.12.0"
      },
      {
        "file": "ReconciliationDispatcher.java",
        "method": "handleDispatch",
        "line": 89,
        "exact": false,
        "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
        "class":
"io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher",
        "version": "1.12.0"
      },
      {
        "file": "ReconciliationDispatcher.java",
        "method": "handleExecution",
        "line": 64,
        "exact": false,
        "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
        "class":
"io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher",
        "version": "1.12.0"
      },
      {
        "file": "EventProcessor.java",
        "method": "run",
        "line": 452,
        "exact": true,
        "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
        "class":
"io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor",
        "version": "1.12.0"
      },
      {
        "method": "runWorker",
        "line": -1,
        "exact": true,
        "location": "?",
        "class": "java.util.concurrent.ThreadPoolExecutor",
        "version": "?"
      },
      {
        "method": "run",
        "line": -1,
        "exact": true,
        "location": "?",
        "class": "java.util.concurrent.ThreadPoolExecutor$Worker",
        "version": "?"
      },
      {
        "method": "run",
        "line": -1,
        "exact": true,
        "location": "?",
        "class": "java.lang.Thread",
        "version": "?"
      }
    ],
    "localizedMessage": "java.lang.NullPointerException",
    "name": "io.javaoperatorsdk.operator.OperatorException",
    "cause": {
      "extendedStackTrace": [
        {
          "file": "FlinkResourceContextFactory.java",
          "method": "getFlinkStateSnapshotContext",
          "line": 96,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class":
"org.apache.flink.kubernetes.operator.service.FlinkResourceContextFactory",
          "version": "1.12.0"
        },
        {
          "file": "FlinkStateSnapshotController.java",
          "method": "cleanup",
          "line": 97,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class":
"org.apache.flink.kubernetes.operator.controller.FlinkStateSnapshotController",
          "version": "1.12.0"
        },
        {
          "file": "FlinkStateSnapshotController.java",
          "method": "cleanup",
          "line": 55,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class":
"org.apache.flink.kubernetes.operator.controller.FlinkStateSnapshotController",
          "version": "1.12.0"
        },
        {
          "file": "Controller.java",
          "method": "execute",
          "line": 199,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class": "io.javaoperatorsdk.operator.processing.Controller$2",
          "version": "1.12.0"
        },
        {
          "file": "Controller.java",
          "method": "execute",
          "line": 162,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class": "io.javaoperatorsdk.operator.processing.Controller$2",
          "version": "1.12.0"
        },
        {
          "file": "OperatorJosdkMetrics.java",
          "method": "timeControllerExecution",
          "line": 80,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class":
"org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics",
          "version": "1.12.0"
        },
        {
          "file": "Controller.java",
          "method": "cleanup",
          "line": 161,
          "exact": false,
          "location": "flink-kubernetes-operator-1.12.0-shaded.jar",
          "class": "io.javaoperatorsdk.operator.processing.Controller",
          "version": "1.12.0"
        }
      ],
      "name": "java.lang.NullPointerException",
      "commonElementCount": 7
    },
    "commonElementCount": 0,
    "message": "java.lang.NullPointerException"
  },
  "endOfBatch": false,
  "thread": "ReconcilerExecutor-flinkstatesnapshotcontroller-352",
  "loggerName":
"io.javaoperatorsdk.operator.processing.event.EventProcessor",
  "threadPriority": 5,
  "instant": {
    "epochSecond": 1750744905,
    "nanoOfSecond": 13000000
  }
}
```


On 2025/03/04 08:29:20 Salva Alcántara wrote:
> Hey all! I recently bumped the Flink Kubernetes Operator to v1.10.0 and
one
> of the things I wanted to check is the usage of the new FlinkStateSnapshot
> CRD. I confirmed that the CRD was correctly created in my cluster, however
> I'm still seeing these logs:
>
> ```
> Starting Operator
> 2025-03-01T08:31:08.779422Z main ERROR appender CONSOLE has no parameter
> that matches element JsonLayout
> 2025-03-01T08:31:08.782927Z main ERROR Unable to locate appender
> "ConsoleAppender" for logger config "root"
> 2025-03-01 08:31:12,885 i.f.k.c.d.i.VersionUsageUtils  [WARN ] The client
> is using resource type 'flinkstatesnapshots' with unstable version
'v1beta1'
> 2025-03-01 08:31:14,180 o.a.f.k.o.c.FlinkConfigManager [WARN ]
> FlinkStateSnapshot CRD was not installed, snapshot resources will be
> disabled!
> ```
>
> I think this relates to the RBAC stuff. For what it's worth, the
> FlinkStateSnapshot CRD was not installed log message goes away if I switch
> to a cluster-wide installaction (which handles RBAC via clusterrole &
> clusterrolebinding). However, for a namespaced installation like mine
> (using a non-empty array for watchNamespaces) there must be something
> wrong, despite RBAC apparently being right, i.e.:
>
> ```
> kubectl auth can-i list flinkstatesnapshot -n a-watched-namespace
> --as=system:serviceaccount:flink-operator:flink-operator
> yes
> ```
>
> The answer is the same for any namespace within watchNamespaces (w.r.t.
> flink-operator, which is where I deploy the operator).
>
> The issue might be in this line:
>
>    -
>
https://github.com/apache/flink-kubernetes-operator/blob/9eb3c385b90a5a2f08376720f[
…]ache/flink/kubernetes/operator/utils/KubernetesClientUtils.java
>    <
https://github.com/apache/flink-kubernetes-operator/blob/9eb3c385b90a5a2f08376720f3204d1784981a0c/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/KubernetesClientUtils.java#L72C31-L72C67
>
>
> which is not passing any special config, maybe the idea was to use
> getKubernetesClient instead? Can anyone help troubleshoot the issue?
>

Reply via email to