We recently upgraded from Flink 1.12.4 to 1.12.5 and are seeing some weird 
behavior after a change in jobmanager leadership: We’re seeing two copies of 
the same job, one of those is in SUSPENDED state and has a start time of zero. 
Here’s the output from the /jobs/overview endpoint:
{
  "jobs": [{
    "jid": "2db4ee6397151a1109d1ca05188a4cbb",
    "name": "analytics-flink-v1",
    "state": "RUNNING",
    "start-time": 1631106146284,
    "end-time": -1,
    "duration": 2954642,
    "last-modification": 1631106152322,
    "tasks": {
      "total": 112,
      "created": 0,
      "scheduled": 0,
      "deploying": 0,
      "running": 112,
      "finished": 0,
      "canceling": 0,
      "canceled": 0,
      "failed": 0,
      "reconciling": 0
    }
  }, {
    "jid": "2db4ee6397151a1109d1ca05188a4cbb",
    "name": "analytics-flink-v1",
    "state": "SUSPENDED",
    "start-time": 0,
    "end-time": -1,
    "duration": 1631105900760,
    "last-modification": 0,
    "tasks": {
      "total": 0,
      "created": 0,
      "scheduled": 0,
      "deploying": 0,
      "running": 0,
      "finished": 0,
      "canceling": 0,
      "canceled": 0,
      "failed": 0,
      "reconciling": 0
    }
  }]
}

Has anyone seen this behavior before?

Thanks,
Peter

Reply via email to