Hi Anjana, When using local file system, Docker containers started during the pipeline execution can only access container's local filesystem. Also, multiple containers are started during pipeline execution which do not have access to other container's file system.
So, in this case, files created by container 1 was not accessible to container 2 which might have been started later. Their are a few ways to solve this problem. 1. Use globally accessible file system HDFS, GCS etc. 2. Use loopback worker which does not use docker containers and hence use host's native local filesystem. Command: python -m apache_beam.examples.wordcount --input=local_input_file --output=local_output_file --job_endpoint=localhost:8099 --experiments beam_fn_api --runner=PortableRunner --environment_cache_millis=10000 3. Add caching to the docker containers. This is not 100% reliable as even after caching, its possible that multiple containers get started. Command: python -m apache_beam.examples.wordcount --input=local_input_file --output=local_output_file --job_endpoint=localhost:8099 --experiments beam_fn_api --runner=PortableRunner --environment_type=LOOPBACK On Tue, May 28, 2019 at 7:30 PM Anjana Pydi <[email protected]> wrote: > > > Hi Team, > > I am trying to run Apache Beam Python word count example on Apache's Flink > with PortableRunner using a SDK harness/Job Server via Docker. > > 1.After building SDK harness container using ./gradlew -p > sdks/python/container docker , It gives below error when trying to do > docker pull : > > Using default tag: latest Error response from daemon: Get > https://$userId-docker- > apache.bintray.io/v2/: x509: certificate is valid for *.bintray.io, > bintray.io, not $userId-docker- apache.bintray.io > > 2. Started Flink portable Jobservice endpoint using ./gradlew > beam-runners-flink_1.5-job-server:runShadow.When trying to run apache > beam word count example using below command, > > python -m apache_beam.examples.wordcount --input=local_input_file > --output=local_output_file --job_endpoint=localhost:8099 > --experiments beam_fn_api --runner=PortableRunner > > > It gives below error: > > File > "/usr/local/lib/python2.7/site-packages/apache_beam/io/localfilesystem.py", > line 134, in _path_open > raw_file = open(path, mode) > RuntimeError: IOError: [Errno 2] No such file or directory: > '/Users/$UserId/Desktop/Beam/output/beam-temp-wordsout.txt-162ea1c67b3311e9aa99025000000001/b6e6490f-9c73-4cae-9344-6400b6798eb7.wordsout.txt' > [while running 'write/Write/WriteImpl/FinalizeWrite'] > > I have few questions like below - > > 1. When I check in /usr/local/lib, I can not see python2.7 folder there > but the error points to that location. I want to understand how it is being > done and if there is any way to point it to virtual environment python > location? > 2. How to fix the docker image x509 certificate issue. I installed > certificates from openssl but it dint fix the issue. > 3. Any detailed documentation on how to make wordcount example work with > PortableRunner via Docker. > > I have posted a question on stack overflow before on the same context. > Below is the link: > > https://stackoverflow.com/questions/56050608/apache-beam-python-word-count-example-is-failing-for-flink-runner-with-beamioerr > > Please let me know in case if any clarifications needed. > > Thanks, > Anjana > > > ----------------------------------------------------------------------------------------------------------------------- > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you are not the intended recipient, please notify us > immediately by responding to this email and then delete it from your > system. Bahwan Cybertek is neither liable for the proper and complete > transmission of the information contained in this communication nor for any > delay in its receipt. >
