Matt is the machine from where you are launching the pipeline different from where it should run?
If that's the case make sure the machine used for launching has all the hdfs environments variable set, as the pipeline is being configured in the launching machine before it hit the worker machine. Good luck JC Am Mo., 28. Jan. 2019, 13:34 hat Matt Casters <[email protected]> geschrieben: > Dear Beam friends, > > In preparation for my presentation of the Kettle Beam work in London next > week I've been trying to get Spark Beam to run which worked in the end. > The problem that resurfaced is however ... once again... back with a > vengeance : > > java.lang.IllegalArgumentException: No filesystem found for scheme hdfs > > > I configured HADOOP_HOME, HADOOP_CONF_DIR, ran > FileSystems.FileSystems.setDefaultPipelineOptions(pipelineOptions), tried > every trick in the book (very few of those are to be found) but it's a > fairly brutal trial-and-error process. > > Given the fact that I'm not the only person hitting these issues I think > it would be a good idea to allow for some sort of feedback of the > FileSystems loading process, which filesystems it tries to load, which fail > and so on. > Also, the maven library situation is a bit fuzzy in the sense that there > are libraries like beam-sdks-java-io-hdfs on a point release (0.6.0) as > well as beam-sdks-java-io-hadoop-file-system on the latest version. > > I've been expanding my trial-and-error pattern to the endpoint and are > ready to give up on Beam-on-Spark. I could try to get a Spark test > environment configured for s3:// but I don't think it's all that > representative of real-world scenarios. > > Thanks anyway in advance for any suggestions, > > Matt > --- > Matt Casters <m <[email protected]>[email protected]> > Senior Solution Architect, Kettle Project Founder > > >
