Re: Downloading and executing addition jar file when using Python API

Robert Bradshaw via user Wed, 24 Jan 2024 08:30:21 -0800

You can also manually designate a replacement jar to be used rather
than fetching the jar from maven, either as a pipeline option or (as
of the next release) as an environment variable. The format is a json
mapping from gradle targets (which is how we identify these jars) to
local files (or urls). For example, pass


  --beam_services='{":sdks:java:extensions:sql:expansion-service:shadowJar":
"/path/to/your/copy.jar"}'

to use the local jar to automatically expand your SQL transforms.

See the docs at
https://github.com/apache/beam/blob/7e95776a8d08ef738be49ef47842029c306f2bf5/sdks/python/apache_beam/options/pipeline_options.py#L587

On Tue, Jan 23, 2024 at 5:59 PM Chamikara Jayalath via user
<[email protected]> wrote:
>
> The expansion service jar is needed since sql.py includes cross-language 
> transforms that use the Java implementation behind the hood.
>
> Once downloaded, the jar is cached, and subsequent jobs should use the jar 
> from that location.
>
> If you want to use a locally available jar, you can manually startup an 
> expansion service [1] and point the Python SQL transform to that [2].
>
> Thanks,
> Cham
>
> [1] 
> https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/#choose-an-expansion-service
> [2] 
> https://github.com/apache/beam/blob/7ff25d896250508570b27683bc76523ac2fe3210/sdks/python/apache_beam/transforms/sql.py#L84
>
> On Tue, Jan 23, 2024 at 3:57 PM Mark Striebeck <[email protected]> 
> wrote:
>>
>> Hi,
>>
>> Sorry, this question seems so obvious that I'm sure it came up before. But I 
>> couldn't find anything in the docs or the mail archives. Feel free to point 
>> me in the right direction...
>>
>> We are using the Python API for Beam. Recently we started using Beam SQL - 
>> which apparently needs a jar file that is not provided with the Python Pip 
>> package. When I run tests,I can see that Beam downloads 
>> beam-sdks-java-extensions-sql-expansion-service-2.52.0.jar and unpacks it 
>> into ~/.apache_beam and uses it to start an RPC server.
>>
>> While this works for local testing, I am trying to figure out how to work 
>> this into our CI and deployment process.
>>
>> Preferably would be to download a pip package that has this jar (and others) 
>> in it and just uses it.
>>
>> If that doesn't exist (I couldn't find it), then we'd need to check this jar 
>> file into our source tree, so that we can use it for CI but then also make 
>> it part of the docker image that we use to run our Beam pipelines on GCP 
>> Dataflow. How could I tell Beam to use that file instead of downloading it? 
>> I tried obvious settings like CLASSPATH environment variable - but nothing 
>> works. Beam always tries to fetch the file from maven.
>>
>> Again, feel free to point me to any relevant mail discussion or web page.
>>
>> Thanks
>>      Mark

Re: Downloading and executing addition jar file when using Python API

Reply via email to