Hello! I am trying to package a Beam Dataflow pipeline as a self executing jar using these <https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar> instructions. However, I am running into a weird issue when attempting to execute this jar.
My pipeline needs to read a file (avro schema .avsc) from GCS outside of a PCollection before starting to work with PCollections. In order to do that I use the FileSystems API. This works perfectly fine when I execute the pipeline via mvn compile exec:java .. However, if I attempt to run this as a jar, it appears to treat the GCS path as local and fails with a FileNotFoundException. *Exception in thread "main" java.io.FileNotFoundException: /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc (No such file or directory)* * at java.io.FileInputStream.open0(Native Method)* * at java.io.FileInputStream.open(FileInputStream.java:195)* * at java.io.FileInputStream.<init>(FileInputStream.java:138)* * at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:113)* * at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:78)* * at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:262)* (Note that the input path is correct with the double slash but the error seems to strip that out e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc) Any pointers on what might be causing this? Thanks, - Sameer
