Hi,
Sorry, this question seems so obvious that I'm sure it came up before. But
I couldn't find anything in the docs or the mail archives. Feel free to
point me in the right direction...
We are using the Python API for Beam. Recently we started using Beam SQL -
which apparently needs a jar file that is not provided with the Python Pip
package. When I run tests,I can see that Beam
downloads beam-sdks-java-extensions-sql-expansion-service-2.52.0.jar and
unpacks it into ~/.apache_beam and uses it to start an RPC server.
While this works for local testing, I am trying to figure out how to work
this into our CI and deployment process.
Preferably would be to download a pip package that has this jar (and
others) in it and just uses it.
If that doesn't exist (I couldn't find it), then we'd need to check this
jar file into our source tree, so that we can use it for CI but then also
make it part of the docker image that we use to run our Beam pipelines on
GCP Dataflow. How could I tell Beam to use that file instead of downloading
it? I tried obvious settings like CLASSPATH environment variable - but
nothing works. Beam always tries to fetch the file from maven.
Again, feel free to point me to any relevant mail discussion or web page.
Thanks
Mark