Hi, Community. There was an issue that happened to one of our Flink Streaming jobs using 1.14.3 and that job didn't enable JobManager HA. The issue is after the only jobManager pod's flink-main-container restarted, some of the taskManager pods keep throwing the below exception:
INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Flink_Kafka_Source -> Filter -> Flat Map (9/16) (3bcddb02cc472fb56be015192a38cf22) switched from DEPLOYING to FAILED on test-taskmanager-1-7 @ 172.17.115.150 (dataPort=46850). java.lang.IllegalStateException: The library registration references a different set of library BLOBs than previous registrations for this job: old:[p-a9f09d52fa47cb7e8707c6d5dbc48de396ae1ab4-54b6f15240547960e63b5d691a53c32f] new:[p-a9f09d52fa47cb7e8707c6d5dbc48de396ae1ab4-882cb6782baa7adf74a1189c77ccb856] at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:416) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:356) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:232) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:199) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:333) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1047) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:637) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322] I can make sure the main jar is identical even after the JM pod's flink-main-container restarted. Could anyone help to explain what that job threw the above exception and what can i do to avoid it?