The artifact staging directory (configurable via the "--artifacts_dir" pipeline option) needs to be accessible to workers, which in this case are containerized. There are a couple workarounds:
1. Don't stage files through the file system at all, i.e. use --runner=FlinkRunner in Python instead of --runner=PortableRunner (instructions on the webpage [1]). 2. Point --artifacts_dir to a distributed file system (GCS, S3, etc.). I also filed [3] because this happens a lot and our logs aren't very helpful. [1] https://beam.apache.org/documentation/runners/flink/ [2] https://issues.apache.org/jira/browse/BEAM-5440 [3] https://issues.apache.org/jira/browse/BEAM-12392 On Mon, May 24, 2021 at 9:57 AM Kenneth Knowles <k...@apache.org> wrote: > Thanks for the details in the gist. I would guess the key to the error is > this: > > 2021/05/24 13:45:34 Initializing java harness: /opt/apache/beam/boot > --id=1-2 --provision_endpoint=localhost:33957 > 2021/05/24 13:45:44 Failed to retrieve staged files: failed to > retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for > /tmp/staged/beam-sdks-java-io-expansion-service-2.28.0-oErG3M6t3yoDeSvfFRW5UCvvM33J-4yNp7xiK9glKtI.jar > caused by: > rpc error: code = Unknown desc = ; > ... > > There is something wrong with the artifact provisioning endpoint perhaps? > I can't help any further than this. > > Kenn > > On Mon, May 24, 2021 at 6:58 AM Nir Gazit <nir....@gmail.com> wrote: > >> Hey, >> I'm having issues with running a simple pipeline on a remote Flink >> cluster. I use a separate Beam job server to which I submit the job, which >> then goes to the Flink cluster with docker-in-docker enabled. For some >> reason, I'm getting errors in Flink which I can't decipher (see gist >> <https://gist.github.com/nirga/351d59dac972cb20786da7094161e748>). >> >> Can someone assist me? >> >> Thanks! >> Nir >> >