Well, my file is not in my local filesystem. It’s in GS. This is the line of code that reads the input file: p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/*"))
And this page https://beam.apache.org/get-started/quickstart/ <https://beam.apache.org/get-started/quickstart/> says the following: "you can’t access a local file if you are running the pipeline on an external cluster”. I’m indeed trying to run a pipeline on a standalone Spark cluster running on my local machine. So local files are not an option. > On Jan 23, 2017, at 4:41 PM, Amit Sela <amitsel...@gmail.com> wrote: > > Why not try file:// instead ? it doesn't seem like you're using Google > Storage, right ? I mean the input file is on your local FS. > > On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu <chaoran...@lightbend.com > <mailto:chaoran...@lightbend.com>> wrote: > No I’m not using Dataproc. > I’m simply running on my local machine. I started a local Spark cluster with > sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam job to > that cluster. > The gs file is the kinglear.txt from Beam’s example code and it should be > public. > > My full stack trace is attached. > > Thanks, > Chaoran > > > >> On Jan 23, 2017, at 4:23 PM, Amit Sela <amitsel...@gmail.com >> <mailto:amitsel...@gmail.com>> wrote: >> >> Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the >> machines hosting the executor processes have access to GS ? could you paste >> the entire stack trace ? >> >> On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu <chaoran...@lightbend.com >> <mailto:chaoran...@lightbend.com>> wrote: >> Thank you Amit for the reply, >> >> I just tried two more runners and below is a summary: >> >> DirectRunner: works >> FlinkRunner: works in local mode. I got an error “Communication with >> JobManager failed: lost connection to the JobManager” when running in >> cluster mode, >> SparkRunner: works in local mode (mvn exec command) but fails in cluster >> mode (spark-submit) with the error I pasted in the previous email. >> >> In SparkRunner’s case, can it be that Spark executor can’t access gs file in >> Google Storage? >> >> Thank you, >> >> >> >>> On Jan 23, 2017, at 3:28 PM, Amit Sela <amitsel...@gmail.com >>> <mailto:amitsel...@gmail.com>> wrote: >>> >>> Is this working for you with other runners ? judging by the stack trace, it >>> seems like IOChannelUtils fails to find a handler so it doesn't seem like >>> it is a Spark specific problem. >>> >>> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu <chaoran...@lightbend.com >>> <mailto:chaoran...@lightbend.com>> wrote: >>> Thank you Amit and JB! >>> >>> This is not related to DC/OS itself, but I ran into a problem when >>> launching a Spark job on a cluster with spark-submit. My Spark job written >>> in Beam can’t read the specified gs file. I got the following error: >>> >>> Caused by: java.io.IOException: Unable to find handler for >>> gs://beam-samples/sample.txt <> >>> at >>> org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307) >>> at >>> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528) >>> at >>> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271) >>> at >>> org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125) >>> >>> Then I thought about switching to reading from another source, but I saw in >>> Beam’s documentation that TextIO can only read from files in Google Cloud >>> Storage (prefixed with gs://) when running in cluster mode. How do you guys >>> doing file IO in Beam when using the SparkRunner? >>> >>> >>> Thank you, >>> Chaoran >>> >>> >>>> On Jan 22, 2017, at 4:32 AM, Amit Sela <amitsel...@gmail.com >>>> <mailto:amitsel...@gmail.com>> wrote: >>>> >>>> I'lll join JB's comment on the Spark runner saying that submitting Beam >>>> pipelines using the Spark runner can be done using Spark's spark-submit >>>> script, find out more in the Spark runner documentation >>>> <https://beam.apache.org/documentation/runners/spark/>. >>>> >>>> Amit. >>>> >>>> On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré <j...@nanthrax.net >>>> <mailto:j...@nanthrax.net>> wrote: >>>> Hi, >>>> >>>> Not directly DCOS (I think Stephen did some test on it), but I have a >>>> platform running Spark and Flink with Beam on Mesos + Marathon. >>>> >>>> It basically doesn't have anything special as running piplines uses >>>> spark-submit (as on in Spark "natively"). >>>> >>>> Regards >>>> JB >>>> >>>> On 01/22/2017 12:56 AM, Chaoran Yu wrote: >>>> > Hello all, >>>> > >>>> > Has anyone had experience using Beam on DC/OS? I want to run Beam code >>>> > >>>> > executed with Spark runner on DC/OS. As a next step, I would like to run >>>> > the >>>> > >>>> > Flink runner as well. There doesn't seem to exist any information >>>> > about running >>>> > >>>> > Beam on DC/OS I can find on the web. So some pointers are greatly >>>> > appreciated. >>>> > >>>> > Thank you, >>>> > >>>> > Chaoran Yu >>>> > >>>> >>>> -- >>>> Jean-Baptiste Onofré >>>> jbono...@apache.org <mailto:jbono...@apache.org> >>>> http://blog.nanthrax.net <http://blog.nanthrax.net/> >>>> Talend - http://www.talend.com <http://www.talend.com/> >>> >> >