That was it! Thanks Lukasz. I had to use a custom assembly to get around
this. Thanks!

On Thu, Jun 21, 2018 at 3:28 PM Lukasz Cwik <lc...@google.com> wrote:

> The FileSystems API uses a ServiceLoader[1] to find Apache Beam FileSystem
> implementations. The ServiceLoader works by finding "service" files on the
> classpath containing a list of classes implementing the Apache Beam
> FileSystem API. The way in which your creating an executable jar is likely
> dropping or incorrectly merging service files. The most common case is that
> your using the Maven shade plugin and you haven't configured it to use the
> services file resource transformer[2]. If you are packaging your executable
> jar a different way, you'll want to lookup the documentation for your tool
> and see how it can properly deal with the service files.
>
> 1: https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html
> 2:
> https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer
>
> On Thu, Jun 21, 2018 at 12:06 PM Sameer Abhyankar <saabhyan...@google.com>
> wrote:
>
>> Hello!
>>
>> I am trying to package a Beam Dataflow pipeline as a self executing jar
>> using these
>> <https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar> 
>> instructions.
>> However, I am running into a weird issue when attempting to execute this
>> jar.
>>
>> My pipeline needs to read a file (avro schema .avsc) from GCS outside of
>> a PCollection before starting to work with PCollections. In order to do
>> that I use the FileSystems API. This works perfectly fine when I execute
>> the pipeline via mvn compile exec:java ..
>>
>> However, if I attempt to run this as a jar, it appears to treat the GCS
>> path as local and fails with a FileNotFoundException.
>>
>> *Exception in thread "main" java.io.FileNotFoundException:
>> /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc
>> (No such file or directory)*
>> * at java.io.FileInputStream.open0(Native Method)*
>> * at java.io.FileInputStream.open(FileInputStream.java:195)*
>> * at java.io.FileInputStream.<init>(FileInputStream.java:138)*
>> * at
>> org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:113)*
>> * at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:78)*
>> * at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:262)*
>>
>> (Note that the input path is correct with the double slash but the error
>> seems to strip that out
>> e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc)
>>
>> Any pointers on what might be causing this?
>>
>> Thanks,
>> - Sameer
>>
>

Reply via email to