Hi Mark, Two quick questions that might help us understand what's going on. - Does this error happen for every of your dataset jobs? For a problematic job, does it happen for every container? - What is the `jobs.jar`? Is it under `lib/`, `opt` of your client side filesystem, or specified as `yarn.ship-files`, `yarn.ship-archives` or `yarn.provided.lib.dirs`? This helps us to locate the code path that this file went through.
Thank you~ Xintong Song On Sun, Jan 17, 2021 at 10:32 PM Mark Davis <moda...@protonmail.com> wrote: > Hi all, > I am upgrading my DataSet jobs from Flink 1.8 to 1.12. > After the upgrade I started to receive the errors like this one: > > 14:12:57,441 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager - > Worker container_e120_1608377880203_0751_01_000112 is terminated. > Diagnostics: Resource > hdfs://bigdata/user/hadoop/.flink/application_1608377880203_0751/jobs.jar > changed on src filesystem (expected 1610892446439, was 1610892446971 > java.io.IOException: Resourceh > dfs://bigdata/user/hadoop/.flink/application_1608377880203_0751/jobs.jar > changed on src filesystem (expected 1610892446439, was 1610892446971 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:257) > at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at > org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) > at > org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:228) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:221) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:209) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > I understand it is somehow related to FLINK-12195, but this time it comes > from the Hadoop code. I am running a very old version of the HDP platform > v.2.6.5 so it might be the one to blame. > But the code was working perfectly fine before the upgrade, so I am > confused. > Could you please advise. > > Thank you! > Mark >