Hi,

We have a dataflow pipeline written in Apache python beam, and are
wondering if we can run a third party code (written in perl) in the
pipeline. We basically want to run

perl myscript.pl $DATA

for every DATA in a PCollection passed to a DoFn

and write the result back into Bigquery.  We could have setup a server for
myscript.pl, and send HTTP/RPC request to the server from each worker
instead. But we are wondering if it is possible to run the script directly
inside the Beam worker? Or even through a docker container packaging our
perl script? If yes, how? what do you think of this approach? Any caveat we
should be aware of?

Thanks!

Reply via email to