Best practice for packaging and deploying Flink jobs on K8S

Sumeet Malhotra Wed, 28 Apr 2021 01:17:42 -0700

Hi,

I have a PyFlink job that consists of:

- Multiple Python files.
- Multiple 3rdparty Python dependencies, specified in a
`requirements.txt` file.
- A few Java dependencies, mainly for external connectors.
- An overall job config YAML file.

Here's a simplified structure of the code layout.

flink/
├── deps
│ ├── jar
│ │ ├── flink-connector-kafka_2.11-1.12.2.jar
│ │ └── kafka-clients-2.4.1.jar
│ └── pip
│ └── requirements.txt
├── conf
│ └── job.yaml
└── job
├── some_file_x.py
├── some_file_y.py
└── main.py

I'm able to execute this job running it locally i.e. invoking something
like:

python main.py --config <path_to_job_yaml>

I'm loading the jars inside the Python code, using env.add_jars(...).

Now, the next step is to submit this job to a Flink cluster running on K8S.
I'm looking for any best practices in packaging and specifying dependencies
that people tend to follow. As per the documentation here [1], various
Python files, including the conf YAML, can be specified using the --pyFiles
option and Java dependencies can be specified using --jarfile option.

So, how can I specify 3rdparty Python package dependencies? According to
another piece of documentation here [2], I should be able to specify the
requirements.txt directly inside the code and submit it via the --pyFiles
option. Is that right?

Are there any other best practices folks use to package/submit jobs?

Thanks,
Sumeet

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table-api-users-guide/dependency_management.html#python-dependency-in-python-program

Best practice for packaging and deploying Flink jobs on K8S

Reply via email to