Hi Piotr,

Jobmanager logs are attached to this email. The only thing that jumps out to me 
is this:

09/08/2021 09:02:26.240 -0400 ERROR 
org.apache.flink.runtime.history.FsJobArchivist Failed to archive job.
  java.io.IOException: File already 
exists:s3p://flink-s3-bucket/history/2db4ee6397151a1109d1ca05188a4cbb

This happened days after the Flink update – and not just once. Across all our 
Flink clusters I’ve seen this 3 times. The cause for the jobmanager leadership 
loss in this case was a deployment of our zookeeper cluster that lead to a 
brief connection loss. The new leader election is expected.

Thanks,
Peter


From: Piotr Nowojski <pnowoj...@apache.org>
Date: Thursday, September 9, 2021 at 12:39 AM
To: Peter Westermann <no.westerm...@genesys.com>
Cc: user@flink.apache.org <user@flink.apache.org>
Subject: Re: Duplicate copies of job in Flink UI/API
Hi Peter,

Can you provide relevant JobManager logs? And can you write down what steps 
have you taken before the failure happened? Did this failure occur during 
upgrading Flink, or after the upgrade etc.

Best,
Piotrek

śr., 8 wrz 2021 o 16:11 Peter Westermann 
<no.westerm...@genesys.com<mailto:no.westerm...@genesys.com>> napisał(a):
We recently upgraded from Flink 1.12.4 to 1.12.5 and are seeing some weird 
behavior after a change in jobmanager leadership: We’re seeing two copies of 
the same job, one of those is in SUSPENDED state and has a start time of zero. 
Here’s the output from the /jobs/overview endpoint:
{
  "jobs": [{
    "jid": "2db4ee6397151a1109d1ca05188a4cbb",
    "name": "analytics-flink-v1",
    "state": "RUNNING",
    "start-time": 1631106146284,
    "end-time": -1,
    "duration": 2954642,
    "last-modification": 1631106152322,
    "tasks": {
      "total": 112,
      "created": 0,
      "scheduled": 0,
      "deploying": 0,
      "running": 112,
      "finished": 0,
      "canceling": 0,
      "canceled": 0,
      "failed": 0,
      "reconciling": 0
    }
  }, {
    "jid": "2db4ee6397151a1109d1ca05188a4cbb",
    "name": "analytics-flink-v1",
    "state": "SUSPENDED",
    "start-time": 0,
    "end-time": -1,
    "duration": 1631105900760,
    "last-modification": 0,
    "tasks": {
      "total": 0,
      "created": 0,
      "scheduled": 0,
      "deploying": 0,
      "running": 0,
      "finished": 0,
      "canceling": 0,
      "canceled": 0,
      "failed": 0,
      "reconciling": 0
    }
  }]
}

Has anyone seen this behavior before?

Thanks,
Peter
09/08/2021 09:02:31.015 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id fbddb90b669081bd9907c835f1906a79.
09/08/2021 09:02:31.015 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id ee62d44923180b0ac66e10ed170f0af3.
09/08/2021 09:02:31.015 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id 13d0e72b41883dfb84e866645b07dc92.
09/08/2021 09:02:31.015 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id ea4064056de27327a2037f4d71aa9e5c.
09/08/2021 09:02:31.015 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id cfcc6014e93f09884ad5f61e4a108e8d.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id 3dd4d17bf50232c95f178d6a235f2dc1.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id 13a7257a9aedd2ce2ca5267f93586763.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{f9b28f7479a60f286846fd9d5f1f4e8e}] and profile 
ResourceProfile{UNKNOWN} with allocation id fbddb90b669081bd9907c835f1906a79 
from resource manager.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{1df0f0709f439fc647c995a36d8c60a7}] and profile 
ResourceProfile{UNKNOWN} with allocation id ee62d44923180b0ac66e10ed170f0af3 
from resource manager.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{2a441ce02b709ffacdcc319d715239d8}] and profile 
ResourceProfile{UNKNOWN} with allocation id 13d0e72b41883dfb84e866645b07dc92 
from resource manager.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{9e3606665fc9b08e63a1cb399863534b}] and profile 
ResourceProfile{UNKNOWN} with allocation id ea4064056de27327a2037f4d71aa9e5c 
from resource manager.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{802408aafe4e015591902c107cecef02}] and profile 
ResourceProfile{UNKNOWN} with allocation id cfcc6014e93f09884ad5f61e4a108e8d 
from resource manager.
09/08/2021 09:02:31.014 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{e0a0f561e532d36e64feafee9eae8100}] and profile 
ResourceProfile{UNKNOWN} with allocation id 3dd4d17bf50232c95f178d6a235f2dc1 
from resource manager.
09/08/2021 09:02:31.013 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{4cfc987782b4aed47bb5d98394601db6}] and profile 
ResourceProfile{UNKNOWN} with allocation id 13a7257a9aedd2ce2ca5267f93586763 
from resource manager.
09/08/2021 09:02:31.013 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Request slot 
with profile ResourceProfile{UNKNOWN} for job 2db4ee6397151a1109d1ca05188a4cbb 
with allocation id 2e742af203d1be847d06c643f3984b54.
09/08/2021 09:02:31.013 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Requesting new slot 
[SlotRequestId{0f25b099a7dd972a69c31532798e19ab}] and profile 
ResourceProfile{UNKNOWN} with allocation id 2e742af203d1be847d06c643f3984b54 
from resource manager.
09/08/2021 09:02:31.013 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
JobManager successfully registered at ResourceManager, leader id: 
b0f998b265508bb4cd715749318a4ce4.
09/08/2021 09:02:31.013 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Registered 
job manager 
038f7d4cea297118551aed586a338a49://0aaaf7660b7f4414af376af4f91f9500:50001/user/rpc/jobmanager_4
 for job 2db4ee6397151a1109d1ca05188a4cbb.
09/08/2021 09:02:31.004 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Registering 
job manager 
038f7d4cea297118551aed586a338a49://0aaaf7660b7f4414af376af4f91f9500:50001/user/rpc/jobmanager_4
 for job 2db4ee6397151a1109d1ca05188a4cbb.
09/08/2021 09:02:31.004 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService Starting 
DefaultLeaderRetrievalService with 
ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/2db4ee6397151a1109d1ca05188a4cbb/job_manager_lock'}.
09/08/2021 09:02:31.004 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Resolved ResourceManager address, beginning registration
09/08/2021 09:02:31.004 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Connecting to ResourceManager 
akka.ssl.tcp://0aaaf7660b7f4414af376af4f91f9500:50001/user/rpc/resourcemanager_0(b0f998b265508bb4cd715749318a4ce4)
09/08/2021 09:02:31.002 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{f9b28f7479a60f286846fd9d5f1f4e8e}]
09/08/2021 09:02:31.001 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{1df0f0709f439fc647c995a36d8c60a7}]
09/08/2021 09:02:31.001 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{2a441ce02b709ffacdcc319d715239d8}]
09/08/2021 09:02:31.001 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{9e3606665fc9b08e63a1cb399863534b}]
09/08/2021 09:02:31.001 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{802408aafe4e015591902c107cecef02}]
09/08/2021 09:02:31.000 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{e0a0f561e532d36e64feafee9eae8100}]
09/08/2021 09:02:31.000 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{4cfc987782b4aed47bb5d98394601db6}]
09/08/2021 09:02:31.000 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Cannot serve slot 
request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{0f25b099a7dd972a69c31532798e19ab}]
09/08/2021 09:02:30.997 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Starting scheduling with scheduling strategy 
[org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
09/08/2021 09:02:30.996 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Starting execution of job analytics-flink-v1 (2db4ee6397151a1109d1ca05188a4cbb) 
under job master id 9ed35c1fb07f037fed21d31d35cc4abf.
09/08/2021 09:02:30.996 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService Starting 
DefaultLeaderRetrievalService with 
ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/resource_manager_lock'}.
09/08/2021 09:02:30.992 -0400 INFO 
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl JobManager runner for 
job analytics-flink-v1 (2db4ee6397151a1109d1ca05188a4cbb) was granted 
leadership with session id ed21d31d-35cc-4abf-9ed3-5c1fb07f037f at 
akka.ssl.tcp://0aaaf7660b7f4414af376af4f91f9500:50001/user/rpc/jobmanager_4.
09/08/2021 09:02:30.990 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Using failover strategy 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartAllFailoverStrategy@7e7f0b53
 for analytics-flink-v1 (2db4ee6397151a1109d1ca05188a4cbb).
09/08/2021 09:02:30.346 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Trying to 
retrieve checkpoint 318603.
09/08/2021 09:02:29.697 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Trying to 
retrieve checkpoint 318602.
09/08/2021 09:02:28.861 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Trying to 
retrieve checkpoint 318601.
09/08/2021 09:02:27.469 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Trying to 
retrieve checkpoint 318600.
09/08/2021 09:02:27.469 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Trying to 
fetch 4 checkpoints from storage.
09/08/2021 09:02:27.469 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Found 4 
checkpoints in 
ZooKeeperStateHandleStore{namespace='analytics-flink/analytics-flink-v1/3069/checkpoints/2db4ee6397151a1109d1ca05188a4cbb'}.
09/08/2021 09:02:27.454 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Recovering 
checkpoints from 
ZooKeeperStateHandleStore{namespace='analytics-flink/analytics-flink-v1/3069/checkpoints/2db4ee6397151a1109d1ca05188a4cbb'}.
09/08/2021 09:02:27.452 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Using application-defined state backend: 
RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 
's3p://inin-prod-aps1-analytics/analytics-flink/analytics-flink-v1/3069/checkpoints/HASH',
 savepoints: 
's3p://inin-prod-aps1-analytics/analytics-flink/savepoints/analytics-flink-v1', 
asynchronous: TRUE, fileStateThreshold: 1048576), localRocksDbDirectories=null, 
enableIncrementalCheckpointing=TRUE, numberOfTransferThreads=2, 
writeBatchSize=2097152}
09/08/2021 09:02:27.451 -0400 INFO 
org.apache.flink.contrib.streaming.state.RocksDBStateBackend Using 
application-defined options factory: AnalyticsRocksOptionsFactory 
[baseline=FLASH_SSD_OPTIMIZED, compressionType=ZSTD_COMPRESSION].
09/08/2021 09:02:27.451 -0400 INFO 
org.apache.flink.contrib.streaming.state.RocksDBStateBackend Using predefined 
options: FLASH_SSD_OPTIMIZED.
09/08/2021 09:02:27.451 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Using job/cluster config to configure application-defined state backend: 
RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 
's3p://inin-prod-aps1-analytics/analytics-flink/analytics-flink-v1/3069/checkpoints/HASH',
 savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), 
localRocksDbDirectories=null, enableIncrementalCheckpointing=UNDEFINED, 
numberOfTransferThreads=-1, writeBatchSize=-1}
09/08/2021 09:02:27.449 -0400 INFO org.apache.flink.runtime.util.ZooKeeperUtils 
Initialized DefaultCompletedCheckpointStore in 
'ZooKeeperStateHandleStore{namespace='analytics-flink/analytics-flink-v1/3069/checkpoints/2db4ee6397151a1109d1ca05188a4cbb'}'
 with /checkpoints/2db4ee6397151a1109d1ca05188a4cbb.
09/08/2021 09:02:27.446 -0400 INFO 
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology Built 1 
pipelined regions in 0 ms
09/08/2021 09:02:27.441 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Successfully ran initialization on master in 0 ms.
09/08/2021 09:02:27.441 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Running initialization on master for job analytics-flink-v1 
(2db4ee6397151a1109d1ca05188a4cbb).
09/08/2021 09:02:27.440 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Using restart back off time strategy 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647, 
backoffTimeMS=1000) for analytics-flink-v1 (2db4ee6397151a1109d1ca05188a4cbb).
09/08/2021 09:02:27.430 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Initializing job analytics-flink-v1 (2db4ee6397151a1109d1ca05188a4cbb).
09/08/2021 09:02:27.430 -0400 INFO 
org.apache.flink.runtime.rpc.akka.AkkaRpcService Starting RPC endpoint for 
org.apache.flink.runtime.jobmaster.JobMaster at 
akka://flink/user/rpc/jobmanager_4 .
09/08/2021 09:02:27.429 -0400 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService Starting 
DefaultLeaderElectionService with 
ZooKeeperLeaderElectionDriver{leaderPath='/leader/2db4ee6397151a1109d1ca05188a4cbb/job_manager_lock'}.
09/08/2021 09:02:26.535 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Registering 
TaskManager with ResourceID 10.105.236.109:50004-1e912c 
(akka.ssl.tcp://f32954a85e3a78e12ad552dafb6935b7:50004/user/rpc/taskmanager_0) 
at ResourceManager
09/08/2021 09:02:26.331 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Registering 
TaskManager with ResourceID 10.105.244.116:50004-6095c5 
(akka.ssl.tcp://40949cf374b33869a2d6b1e0fd532b7c:50004/user/rpc/taskmanager_0) 
at ResourceManager
09/08/2021 09:02:26.283 -0400 INFO 
org.apache.flink.runtime.rpc.akka.AkkaRpcService Starting RPC endpoint for 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
akka://flink/user/rpc/dispatcher_3 .
09/08/2021 09:02:26.281 -0400 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess 
Successfully recovered 1 persisted job graphs.
09/08/2021 09:02:26.281 -0400 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore Recovered 
JobGraph(jobId: 2db4ee6397151a1109d1ca05188a4cbb).
09/08/2021 09:02:26.241 -0400 INFO 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher Could not archive 
completed job analytics-flink-v1(2db4ee6397151a1109d1ca05188a4cbb) to the 
history server.
  java.io.IOException: File already 
exists:s3p://flink-s3-bucket/history/2db4ee6397151a1109d1ca05188a4cbb
        at c.f.p.h.s.PrestoS3FileSystem.create(PrestoS3FileSystem.java:357)
        at o.a.h.fs.FileSystem.create(FileSystem.java:1169)
        at o.a.h.fs.FileSystem.create(FileSystem.java:1149)
        at o.a.h.fs.FileSystem.create(FileSystem.java:1038)
        at o.a.f.f.s.c.HadoopFileSystem.create(HadoopFileSystem.java:154)
        at o.a.f.f.s.c.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at 
o.a.f.c.f.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:170)
        at o.a.f.r.h.FsJobArchivist.archiveJob(FsJobArchivist.java:73)
        at 
o.a.f.r.d.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:57)
        at 
o.a.f.u.f.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49)
        ... 4 common frames omitted
  Wrapped by: j.l.RuntimeException: java.io.IOException: File already 
exists:s3p://flink-s3-bucket/history/2db4ee6397151a1109d1ca05188a4cbb
        at o.a.f.u.ExceptionUtils.rethrow(ExceptionUtils.java:316)
        at 
o.a.f.u.f.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:51)
        at j.u.c.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        ... 3 common frames omitted
  Wrapped by: j.u.c.CompletionException: java.lang.RuntimeException: 
java.io.IOException: File already 
exists:s3p://flink-s3-bucket/history/2db4ee6397151a1109d1ca05188a4cbb
        at j.u.c.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at j.u.c.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at j.u.c.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
        at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
09/08/2021 09:02:26.240 -0400 ERROR 
org.apache.flink.runtime.history.FsJobArchivist Failed to archive job.
  java.io.IOException: File already 
exists:s3p://flink-s3-bucket/history/2db4ee6397151a1109d1ca05188a4cbb
  at c.f.p.h.s.PrestoS3FileSystem.create(PrestoS3FileSystem.java:357)
  at o.a.h.fs.FileSystem.create(FileSystem.java:1169)
  at o.a.h.fs.FileSystem.create(FileSystem.java:1149)
  at o.a.h.fs.FileSystem.create(FileSystem.java:1038)
  at o.a.f.f.s.c.HadoopFileSystem.create(HadoopFileSystem.java:154)
  at o.a.f.f.s.c.HadoopFileSystem.create(HadoopFileSystem.java:37)
  at 
o.a.f.c.f.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:170)
  at o.a.f.r.h.FsJobArchivist.archiveJob(FsJobArchivist.java:73)
  at 
o.a.f.r.d.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:57)
  at o.a.f.u.f.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49)
  at j.u.c.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
  at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
09/08/2021 09:02:26.227 -0400 INFO 
org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphStoreWatcher Stopping 
ZooKeeperJobGraphStoreWatcher
09/08/2021 09:02:26.212 -0400 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore Stopping 
DefaultJobGraphStore.
09/08/2021 09:02:26.211 -0400 INFO 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher Stopped dispatcher 
akka.ssl.tcp://1dd8e04affb77f1da7ab5f7c9202b570:50001/user/rpc/dispatcher_1.
09/08/2021 09:02:26.211 -0400 INFO 
org.apache.flink.runtime.rest.handler.legacy.backpressure.BackPressureRequestCoordinator
 Shutting down back pressure request coordinator.
09/08/2021 09:02:26.201 -0400 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore Released job graph 
2db4ee6397151a1109d1ca05188a4cbb from 
ZooKeeperStateHandleStore{namespace='analytics-flink/analytics-flink-v1/3069/jobgraphs'}.
09/08/2021 09:02:26.192 -0400 INFO 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher Job 
2db4ee6397151a1109d1ca05188a4cbb reached terminal state SUSPENDED.
09/08/2021 09:02:26.191 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver Closing 
ZooKeeperLeaderElectionDriver{leaderPath='/leader/2db4ee6397151a1109d1ca05188a4cbb/job_manager_lock'}
09/08/2021 09:02:26.191 -0400 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService Stopping 
DefaultLeaderElectionService.
09/08/2021 09:02:26.190 -0400 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess 
Trying to recover job with job id 2db4ee6397151a1109d1ca05188a4cbb.
09/08/2021 09:02:26.190 -0400 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore Retrieved job ids 
[2db4ee6397151a1109d1ca05188a4cbb] from 
ZooKeeperStateHandleStore{namespace='analytics-flink/analytics-flink-v1/3069/jobgraphs'}
09/08/2021 09:02:26.189 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Registering 
TaskManager with ResourceID 10.105.217.197:50004-6176ad 
(akka.ssl.tcp://0a11cfda1c06f5500298d05e94d88c13:50004/user/rpc/taskmanager_0) 
at ResourceManager
09/08/2021 09:02:26.182 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Stopping SlotPool.
09/08/2021 09:02:26.182 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Close ResourceManager connection fb0bf924810087592ad931aecb0387b1: Stopping 
JobMaster for job analytics-flink-v1(2db4ee6397151a1109d1ca05188a4cbb)..
09/08/2021 09:02:26.182 -0400 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess 
Recover all persisted job graphs.
09/08/2021 09:02:26.182 -0400 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess Start 
SessionDispatcherLeaderProcess.
09/08/2021 09:02:26.181 -0400 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl Suspending SlotPool.
09/08/2021 09:02:26.178 -0400 INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter Shutting down.
09/08/2021 09:02:26.164 -0400 WARN org.apache.flink.metrics.MetricGroup Name 
collision: Group already contains a Metric with the name 'taskSlotsTotal'. 
Metric will not be reported.[jobmanager, 10.105.221.188]
09/08/2021 09:02:26.164 -0400 WARN org.apache.flink.metrics.MetricGroup Name 
collision: Group already contains a Metric with the name 'taskSlotsAvailable'. 
Metric will not be reported.[jobmanager, 10.105.221.188]
09/08/2021 09:02:26.163 -0400 INFO 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl Starting 
the SlotManager.
09/08/2021 09:02:26.163 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager 
ResourceManager 
akka.ssl.tcp://0aaaf7660b7f4414af376af4f91f9500:50001/user/rpc/resourcemanager_0
 was granted leadership with fencing token b0f998b265508bb4cd715749318a4ce4
09/08/2021 09:02:26.161 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:26.161 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper was reconnected. Leader retrieval can be restarted.
09/08/2021 09:02:26.160 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper was reconnected. Leader retrieval can be restarted.
09/08/2021 09:02:26.160 -0400 INFO 
org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphStoreWatcher ZooKeeper 
connection RECONNECTED. Changes to the submitted job graphs are monitored again.
09/08/2021 09:02:26.160 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:26.160 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:26.159 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:26.159 -0400 INFO 
org.apache.flink.shaded.curator4.org.apache.curator.framework.state.ConnectionStateManager
 State change: RECONNECTED
09/08/2021 09:02:26.159 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Session 
establishment complete on server zkeeper-2/10.105.219.52:2181, sessionid = 
0x30000016c7d978a, negotiated timeout = 10000
09/08/2021 09:02:26.158 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket 
connection established to zkeeper-2/10.105.219.52:2181, initiating session
09/08/2021 09:02:26.157 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-2/10.105.219.52:2181
09/08/2021 09:02:25.274 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper was reconnected. Leader retrieval can be restarted.
09/08/2021 09:02:25.273 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper was reconnected. Leader retrieval can be restarted.
09/08/2021 09:02:25.273 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:25.273 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:25.273 -0400 INFO 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper was reconnected. Leader election can be restarted.
09/08/2021 09:02:25.273 -0400 INFO 
org.apache.flink.shaded.curator4.org.apache.curator.framework.state.ConnectionStateManager
 State change: RECONNECTED
09/08/2021 09:02:25.273 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Session 
establishment complete on server zkeeper-1/10.105.253.30:2181, sessionid = 
0x30000016c7d9789, negotiated timeout = 10000
09/08/2021 09:02:25.271 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket 
connection established to zkeeper-1/10.105.253.30:2181, initiating session
09/08/2021 09:02:25.271 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-1/10.105.253.30:2181
09/08/2021 09:02:25.120 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket error 
occurred: zkeeper-3/10.105.233.83:2181: Connection refused
09/08/2021 09:02:25.119 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-3/10.105.233.83:2181
09/08/2021 09:02:25.027 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket error 
occurred: zkeeper-3/10.105.233.83:2181: Connection refused
09/08/2021 09:02:25.026 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-3/10.105.233.83:2181
09/08/2021 09:02:23.738 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Unable to 
read additional data from server sessionid 0x30000016c7d978a, likely server has 
closed socket, closing socket connection and attempting reconnect
09/08/2021 09:02:23.737 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket 
connection established to zkeeper-1/10.105.253.30:2181, initiating session
09/08/2021 09:02:23.737 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-1/10.105.253.30:2181
09/08/2021 09:02:23.646 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Unable to 
read additional data from server sessionid 0x30000016c7d9789, likely server has 
closed socket, closing socket connection and attempting reconnect
09/08/2021 09:02:23.645 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket 
connection established to zkeeper-2/10.105.219.52:2181, initiating session
09/08/2021 09:02:23.644 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-2/10.105.219.52:2181
09/08/2021 09:02:23.507 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Unable to 
read additional data from server sessionid 0x30000016c7d9789, likely server has 
closed socket, closing socket connection and attempting reconnect
09/08/2021 09:02:23.506 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket 
connection established to zkeeper-1/10.105.253.30:2181, initiating session
09/08/2021 09:02:23.505 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-1/10.105.253.30:2181
09/08/2021 09:02:22.879 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Unable to 
read additional data from server sessionid 0x30000016c7d978a, likely server has 
closed socket, closing socket connection and attempting reconnect
09/08/2021 09:02:22.878 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Socket 
connection established to zkeeper-2/10.105.219.52:2181, initiating session
09/08/2021 09:02:22.877 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Opening 
socket connection to server zkeeper-2/10.105.219.52:2181
09/08/2021 09:02:22.742 -0400 INFO 
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore Suspending
09/08/2021 09:02:22.725 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver Closing 
ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/resource_manager_lock'}.
09/08/2021 09:02:22.725 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService Stopping 
DefaultLeaderRetrievalService.
09/08/2021 09:02:22.724 -0400 INFO org.apache.flink.runtime.jobmaster.JobMaster 
Stopping the JobMaster for job 
analytics-flink-v1(2db4ee6397151a1109d1ca05188a4cbb).
09/08/2021 09:02:22.679 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender https://10.105.245.207:8081 no 
longer participates in the leader election.
09/08/2021 09:02:22.679 -0400 WARN 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
09/08/2021 09:02:22.679 -0400 WARN 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
09/08/2021 09:02:22.679 -0400 WARN 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
09/08/2021 09:02:22.679 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender LeaderContender: 
JobManagerRunnerImpl no longer participates in the leader election.
09/08/2021 09:02:22.679 -0400 WARN 
org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphStoreWatcher ZooKeeper 
connection SUSPENDING. Changes to the submitted job graphs are not monitored 
(temporarily).
09/08/2021 09:02:22.677 -0400 INFO 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher Stopping all currently 
running jobs of dispatcher 
akka.ssl.tcp://1dd8e04affb77f1da7ab5f7c9202b570:50001/user/rpc/dispatcher_1.
09/08/2021 09:02:22.677 -0400 INFO 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher Stopping dispatcher 
akka.ssl.tcp://1dd8e04affb77f1da7ab5f7c9202b570:50001/user/rpc/dispatcher_1.
09/08/2021 09:02:22.677 -0400 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess 
Stopping SessionDispatcherLeaderProcess.
09/08/2021 09:02:22.677 -0400 INFO 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl Suspending 
the SlotManager.
09/08/2021 09:02:22.676 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver Closing 
ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/2db4ee6397151a1109d1ca05188a4cbb/job_manager_lock'}.
09/08/2021 09:02:22.676 -0400 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService Stopping 
DefaultLeaderRetrievalService.
09/08/2021 09:02:22.676 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager 
ResourceManager 
akka.ssl.tcp://1dd8e04affb77f1da7ab5f7c9202b570:50001/user/rpc/resourcemanager_0
 was revoked leadership. Clearing fencing token.
09/08/2021 09:02:22.676 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender LeaderContender: 
DefaultDispatcherRunner no longer participates in the leader election.
09/08/2021 09:02:22.676 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Closing 
TaskExecutor connection 10.105.217.197:50004-6176ad because: ResourceManager 
leader changed to new address null
09/08/2021 09:02:22.675 -0400 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager Closing 
TaskExecutor connection 10.105.236.109:50004-1e912c because: ResourceManager 
leader changed to new address null
09/08/2021 09:02:22.675 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender LeaderContender: 
StandaloneResourceManager no longer participates in the leader election.
09/08/2021 09:02:22.675 -0400 INFO 
org.apache.flink.shaded.curator4.org.apache.curator.framework.state.ConnectionStateManager
 State change: SUSPENDED
09/08/2021 09:02:22.673 -0400 WARN 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
09/08/2021 09:02:22.673 -0400 WARN 
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
09/08/2021 09:02:22.673 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender LeaderContender: 
StandaloneResourceManager no longer participates in the leader election.
09/08/2021 09:02:22.673 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender https://10.105.221.188:8081 no 
longer participates in the leader election.
09/08/2021 09:02:22.673 -0400 WARN 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver 
Connection to ZooKeeper suspended. The contender LeaderContender: 
DefaultDispatcherRunner no longer participates in the leader election.
09/08/2021 09:02:22.673 -0400 INFO 
org.apache.flink.shaded.curator4.org.apache.curator.framework.state.ConnectionStateManager
 State change: SUSPENDED
09/08/2021 09:02:22.574 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Unable to 
read additional data from server sessionid 0x30000016c7d978a, likely server has 
closed socket, closing socket connection and attempting reconnect
09/08/2021 09:02:22.573 -0400 INFO 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn Unable to 
read additional data from server sessionid 0x30000016c7d9789, likely server has 
closed socket, closing socket connection and attempting reconnect

Reply via email to