Thank you Amit for the reply, I just tried two more runners and below is a summary:
DirectRunner: works FlinkRunner: works in local mode. I got an error “Communication with JobManager failed: lost connection to the JobManager” when running in cluster mode, SparkRunner: works in local mode (mvn exec command) but fails in cluster mode (spark-submit) with the error I pasted in the previous email. In SparkRunner’s case, can it be that Spark executor can’t access gs file in Google Storage? Thank you, > On Jan 23, 2017, at 3:28 PM, Amit Sela <amitsel...@gmail.com> wrote: > > Is this working for you with other runners ? judging by the stack trace, it > seems like IOChannelUtils fails to find a handler so it doesn't seem like it > is a Spark specific problem. > > On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu <chaoran...@lightbend.com > <mailto:chaoran...@lightbend.com>> wrote: > Thank you Amit and JB! > > This is not related to DC/OS itself, but I ran into a problem when launching > a Spark job on a cluster with spark-submit. My Spark job written in Beam > can’t read the specified gs file. I got the following error: > > Caused by: java.io.IOException: Unable to find handler for > gs://beam-samples/sample.txt <> > at > org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307) > at > org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528) > at > org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271) > at > org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125) > > Then I thought about switching to reading from another source, but I saw in > Beam’s documentation that TextIO can only read from files in Google Cloud > Storage (prefixed with gs://) when running in cluster mode. How do you guys > doing file IO in Beam when using the SparkRunner? > > > Thank you, > Chaoran > > >> On Jan 22, 2017, at 4:32 AM, Amit Sela <amitsel...@gmail.com >> <mailto:amitsel...@gmail.com>> wrote: >> >> I'lll join JB's comment on the Spark runner saying that submitting Beam >> pipelines using the Spark runner can be done using Spark's spark-submit >> script, find out more in the Spark runner documentation >> <https://beam.apache.org/documentation/runners/spark/>. >> >> Amit. >> >> On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré <j...@nanthrax.net >> <mailto:j...@nanthrax.net>> wrote: >> Hi, >> >> Not directly DCOS (I think Stephen did some test on it), but I have a >> platform running Spark and Flink with Beam on Mesos + Marathon. >> >> It basically doesn't have anything special as running piplines uses >> spark-submit (as on in Spark "natively"). >> >> Regards >> JB >> >> On 01/22/2017 12:56 AM, Chaoran Yu wrote: >> > Hello all, >> > >> > Has anyone had experience using Beam on DC/OS? I want to run Beam code >> > >> > executed with Spark runner on DC/OS. As a next step, I would like to run >> > the >> > >> > Flink runner as well. There doesn't seem to exist any information >> > about running >> > >> > Beam on DC/OS I can find on the web. So some pointers are greatly >> > appreciated. >> > >> > Thank you, >> > >> > Chaoran Yu >> > >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> http://blog.nanthrax.net <http://blog.nanthrax.net/> >> Talend - http://www.talend.com <http://www.talend.com/> >