Hi, We have a dataflow pipeline written in Apache python beam, and are wondering if we can run a third party code (written in perl) in the pipeline. We basically want to run
perl myscript.pl $DATA for every DATA in a PCollection passed to a DoFn and write the result back into Bigquery. We could have setup a server for myscript.pl, and send HTTP/RPC request to the server from each worker instead. But we are wondering if it is possible to run the script directly inside the Beam worker? Or even through a docker container packaging our perl script? If yes, how? what do you think of this approach? Any caveat we should be aware of? Thanks!
