Hi Mark,

Two quick questions that might help us understand what's going on.
- Does this error happen for every of your dataset jobs? For a problematic
job, does it happen for every container?
- What is the `jobs.jar`? Is it under `lib/`, `opt` of your client side
filesystem, or specified as `yarn.ship-files`, `yarn.ship-archives` or
`yarn.provided.lib.dirs`? This helps us to locate the code path that this
file went through.

Thank you~

Xintong Song



On Sun, Jan 17, 2021 at 10:32 PM Mark Davis <moda...@protonmail.com> wrote:

> Hi all,
> I am upgrading my DataSet jobs from Flink 1.8 to 1.12.
> After the upgrade I started to receive the errors like this one:
>
> 14:12:57,441 INFO
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager  -
> Worker container_e120_1608377880203_0751_01_000112 is terminated.
> Diagnostics: Resource
> hdfs://bigdata/user/hadoop/.flink/application_1608377880203_0751/jobs.jar
> changed on src filesystem (expected 1610892446439, was 1610892446971
> java.io.IOException: Resourceh
> dfs://bigdata/user/hadoop/.flink/application_1608377880203_0751/jobs.jar
> changed on src filesystem (expected 1610892446439, was 1610892446971
>         at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:257)
>         at
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>         at
> org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>         at
> org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:228)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:221)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:209)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> I understand it is somehow related to FLINK-12195, but this time it comes
> from the Hadoop code. I am running a very old version of the HDP platform
> v.2.6.5 so it might be the one to blame.
> But the code was working perfectly fine before the upgrade, so I am
> confused.
> Could you please advise.
>
> Thank you!
>   Mark
>

Reply via email to