Thank you! With your tips I got it working with what seems like a clean
solution to me. Maybe this explanation will help someone else.
1) In the docker-compose.yaml file, I mount a volume on my project folder as
/opt/pyflink-walkthrough in the jobmanager container:
volumes: - .:/opt/pyflink-walkthrough
2) My PyFlink jobs are then executed with the --pyFiles command-line option to
reference the site packages folder:
docker-compose exec jobmanager ./bin/flink run -py
/opt/pyflink-walkthrough/some-pyflink-job.py -d --pyFiles
/opt/pyflink-walkthrough/lib/python3.8/site-packages
I'm using venv so the packages required by my project are within the project
folder. This allows me to import packages at the top of my script file. Also,
in the future I can install new packages into my project's virtual environment
and they will be found by the Python compilation step in the jobmanager
container without re-building my Docker image nor changing the
docker-compose.yaml.
Regards,John
On Monday, November 22, 2021, 06:09:18 PM PST, Dian Fu
<[email protected]> wrote:
Does the exception `ModuleNotFoundError: No module named 'dotenv' ` occur
during job submission or during job running?
The argument `--pyRequirements` only works during job running, that's it will
download the dependencies specified via `--pyRequirements` and add them to
PYTHONPATH of the Python worker.
For your problem, I guess you could try one of the following ways:
- If the exception occurs during job submission (job compiling), then I guess
you could try to move the `import` of `dotenv` from the header of the file to
where it's used. That's turning it into a local import. Then it will avoid
importing this library during job compiling.
- Use option `-pyfs` [1].
- Use option `-pyarch`[2] together with option `-pyexec`[3] and
`-pyclientexec`[4].
Regards,
Dian
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#python-libraries
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#archives
[3]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#python-interpreter
[4]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#python-interpreter-of-client
On Tue, Nov 23, 2021 at 3:17 AM John Lafer <[email protected]> wrote:
Hi,
I'm running a PyFlink 1.13.1 cluster in Docker. The pyflink-walkthrough example
from the Apache Flink Playgrounds repo (flink-playgrounds - Git at Google)
works fine.
However, I'm unable to add a Python package dependency, like "python-dotenv" to
the Python job. I've tried the suggested techniques for specifying dependencies
in a requirements file as described in Dependency Management (i.e.,
--pyRequirements or using t_env.set_python_requirements) but they seem to have
no effect.
For example, in my Dockerfile I copy my requirements.txt file into the
jobmanager host at /opt/flink/requirements.txt like so:RUN echo
"python-dotenv==0.19.2" >> /opt/flink/requirements.txtI've verified its
contents inside the jobmanager container.
And then from my client machine I submit the job like this into my docker-based
cluster:docker-compose exec jobmanager ./bin/flink run -py
/opt/pyflink-walkthrough/payment_msg_proccessing.py -d --pyRequirements
/opt/flink/requirements.txt
However, all attempts result in the job producing the error:
ModuleNotFoundError: No module named 'dotenv'
I'm wondering if the technique I'm using works for Python jobs or only for
Python UDFs?
Help would be appreciated.