Thank you! With your tips I got it working with what seems like a clean 
solution to me. Maybe this explanation will help someone else.
1) In the docker-compose.yaml file, I mount a volume on my project folder as 
/opt/pyflink-walkthrough in the jobmanager container:
 volumes: - .:/opt/pyflink-walkthrough
2) My PyFlink jobs are then executed with the --pyFiles command-line option to 
reference the site packages folder:
docker-compose exec jobmanager ./bin/flink run -py 
/opt/pyflink-walkthrough/some-pyflink-job.py -d --pyFiles 
/opt/pyflink-walkthrough/lib/python3.8/site-packages

I'm using venv so the packages required by my project are within the project 
folder. This allows me to import packages at the top of my script file. Also, 
in the future I can install new packages into my project's virtual environment 
and they will be found by the Python compilation step in the jobmanager 
container without re-building my Docker image nor changing the 
docker-compose.yaml.
Regards,John

    On Monday, November 22, 2021, 06:09:18 PM PST, Dian Fu 
<[email protected]> wrote:  
 
 Does the exception `ModuleNotFoundError: No module named 'dotenv' ` occur 
during job submission or during job running?

The argument `--pyRequirements` only works during job running, that's it will 
download the dependencies specified via `--pyRequirements` and add them to 
PYTHONPATH of the Python worker.

For your problem, I guess you could try one of the following ways:
- If the exception occurs during job submission (job compiling),  then I guess 
you could try to move the `import` of `dotenv` from the header of the file to 
where it's used. That's turning it into a local import. Then it will avoid 
importing this library during job compiling.
- Use option `-pyfs` [1]. 
- Use option `-pyarch`[2] together with option `-pyexec`[3] and 
`-pyclientexec`[4].

Regards,
Dian

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#python-libraries
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#archives
[3] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#python-interpreter
[4] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/dependency_management/#python-interpreter-of-client
On Tue, Nov 23, 2021 at 3:17 AM John Lafer <[email protected]> wrote:

Hi,
I'm running a PyFlink 1.13.1 cluster in Docker. The pyflink-walkthrough example 
from the Apache Flink Playgrounds repo (flink-playgrounds - Git at Google) 
works fine.

However, I'm unable to add a Python package dependency, like "python-dotenv" to 
the Python job. I've tried the suggested techniques for specifying dependencies 
in a requirements file as described in Dependency Management (i.e., 
--pyRequirements or using t_env.set_python_requirements) but they seem to have 
no effect.
For example, in my Dockerfile I copy my requirements.txt file into the 
jobmanager host at /opt/flink/requirements.txt like so:RUN echo 
"python-dotenv==0.19.2" >> /opt/flink/requirements.txtI've verified its 
contents inside the jobmanager container.
And then from my client machine I submit the job like this into my docker-based 
cluster:docker-compose exec jobmanager ./bin/flink run -py 
/opt/pyflink-walkthrough/payment_msg_proccessing.py -d --pyRequirements 
/opt/flink/requirements.txt 
However, all attempts result in the job producing the error: 
ModuleNotFoundError: No module named 'dotenv' 
I'm wondering if the technique I'm using works for Python jobs or only for 
Python UDFs?
Help would be appreciated.

  

Reply via email to