Hi, Community.  There was an issue that happened to one of our Flink Streaming 
jobs using 1.14.3 and that job didn't enable JobManager HA.  The issue is after 
the only jobManager pod's flink-main-container restarted,  some of the 
taskManager pods keep throwing the below exception:

INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Flink_Kafka_Source -> Filter -> Flat Map (9/16) 
(3bcddb02cc472fb56be015192a38cf22) switched from DEPLOYING to FAILED on 
test-taskmanager-1-7 @ 172.17.115.150 (dataPort=46850). 
java.lang.IllegalStateException: The library registration references a 
different set of library BLOBs than previous registrations for this job: 
old:[p-a9f09d52fa47cb7e8707c6d5dbc48de396ae1ab4-54b6f15240547960e63b5d691a53c32f]
 
new:[p-a9f09d52fa47cb7e8707c6d5dbc48de396ae1ab4-882cb6782baa7adf74a1189c77ccb856]
 at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:416)
 ~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:356)
 ~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:232)
 ~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:199)
 ~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:333)
 ~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1047)
 ~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:637) 
~[flink-dist_2.11-1.14.3.jar:1.14.3] at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
~[flink-dist_2.11-1.14.3.jar:1.14.3]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]

I can make sure the main jar is identical even after the JM pod's 
flink-main-container restarted.
Could anyone help to explain what that job threw the above exception and what 
can i do to avoid it?

Reply via email to