Hi,

I'm using UDTFs in PyFlink, that depend upon a few resource files (JSON
schema files actually). The path of this file can be passed into the UDTF,
but essentially this path needs to exist on the Task Manager node where the
task executes. What's the best way to upload these resource files? As of
now, my custom Flink image creates a fixed path with the required resource
files, but I'd like it to be run time configurable.

There are 2 APIs available to load files when submitting a PyFlink job...

*stream_execution_environment.add_python_file()* - Recommended to upload
files (.py etc) but doesn't let me configure the final path on the target
node. The files are added to PYTHONPATH, but it needs the UDTF function to
lookup for this file. I'd like to pass the file location into the UDTF
instead.

*stream_execution_environment.add_python_archive()* - Appears to be more
generic, in the sense that it allows a target directory to be specified.
The documentation doesn't say anything about the contents of the archive,
so I'm guessing it could be any type of file. Is this what is needed for my
use case?

Or is there any other recommended way to upload non-Python
dependencies/resources?

Thanks in advance,
Sumeet

Reply via email to