The artifact staging directory (configurable via the "--artifacts_dir"
pipeline option) needs to be accessible to workers, which in this case are
containerized. There are a couple workarounds:

1. Don't stage files through the file system at all, i.e. use
--runner=FlinkRunner in Python instead of --runner=PortableRunner
(instructions on the webpage [1]).
2. Point --artifacts_dir to a distributed file system (GCS, S3, etc.).

I also filed [3] because this happens a lot and our logs aren't very
helpful.

[1] https://beam.apache.org/documentation/runners/flink/
[2] https://issues.apache.org/jira/browse/BEAM-5440
[3] https://issues.apache.org/jira/browse/BEAM-12392

On Mon, May 24, 2021 at 9:57 AM Kenneth Knowles <k...@apache.org> wrote:

> Thanks for the details in the gist. I would guess the key to the error is
> this:
>
>     2021/05/24 13:45:34 Initializing java harness: /opt/apache/beam/boot
> --id=1-2 --provision_endpoint=localhost:33957
>     2021/05/24 13:45:44 Failed to retrieve staged files: failed to
> retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for
> /tmp/staged/beam-sdks-java-io-expansion-service-2.28.0-oErG3M6t3yoDeSvfFRW5UCvvM33J-4yNp7xiK9glKtI.jar
>     caused by:
>     rpc error: code = Unknown desc = ;
>     ...
>
> There is something wrong with the artifact provisioning endpoint perhaps?
> I can't help any further than this.
>
> Kenn
>
> On Mon, May 24, 2021 at 6:58 AM Nir Gazit <nir....@gmail.com> wrote:
>
>> Hey,
>> I'm having issues with running a simple pipeline on a remote Flink
>> cluster. I use a separate Beam job server to which I submit the job, which
>> then goes to the Flink cluster with docker-in-docker enabled. For some
>> reason, I'm getting errors in Flink which I can't decipher (see gist
>> <https://gist.github.com/nirga/351d59dac972cb20786da7094161e748>).
>>
>> Can someone assist me?
>>
>> Thanks!
>> Nir
>>
>

Reply via email to