Thank you Amit for the reply,

I just tried two more runners and below is a summary:

DirectRunner: works
FlinkRunner: works in local mode. I got an error “Communication with JobManager 
failed: lost connection to the JobManager” when running in cluster mode, 
SparkRunner: works in local mode (mvn exec command) but fails in cluster mode 
(spark-submit) with the error I pasted in the previous email.

In SparkRunner’s case, can it be that Spark executor can’t access gs file in 
Google Storage?

Thank you,



> On Jan 23, 2017, at 3:28 PM, Amit Sela <amitsel...@gmail.com> wrote:
> 
> Is this working for you with other runners ? judging by the stack trace, it 
> seems like IOChannelUtils fails to find a handler so it doesn't seem like it 
> is a Spark specific problem. 
> 
> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu <chaoran...@lightbend.com 
> <mailto:chaoran...@lightbend.com>> wrote:
> Thank you Amit and JB! 
> 
> This is not related to DC/OS itself, but I ran into a problem when launching 
> a Spark job on a cluster with spark-submit. My Spark job written in Beam 
> can’t read the specified gs file. I got the following error:
> 
> Caused by: java.io.IOException: Unable to find handler for 
> gs://beam-samples/sample.txt <>
>       at 
> org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307)
>       at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528)
>       at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271)
>       at 
> org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125)
> 
> Then I thought about switching to reading from another source, but I saw in 
> Beam’s documentation that TextIO can only read from files in Google Cloud 
> Storage (prefixed with gs://) when running in cluster mode. How do you guys 
> doing file IO in Beam when using the SparkRunner?
> 
> 
> Thank you,
> Chaoran
> 
> 
>> On Jan 22, 2017, at 4:32 AM, Amit Sela <amitsel...@gmail.com 
>> <mailto:amitsel...@gmail.com>> wrote:
>> 
>> I'lll join JB's comment on the Spark runner saying that submitting Beam 
>> pipelines using the Spark runner can be done using Spark's spark-submit 
>> script, find out more in the Spark runner documentation 
>> <https://beam.apache.org/documentation/runners/spark/>.
>> 
>> Amit.
>> 
>> On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré <j...@nanthrax.net 
>> <mailto:j...@nanthrax.net>> wrote:
>> Hi,
>> 
>> Not directly DCOS (I think Stephen did some test on it), but I have a
>> platform running Spark and Flink with Beam on Mesos + Marathon.
>> 
>> It basically doesn't have anything special as running piplines uses
>> spark-submit (as on in Spark "natively").
>> 
>> Regards
>> JB
>> 
>> On 01/22/2017 12:56 AM, Chaoran Yu wrote:
>> > Hello all,
>> >
>> >   Has anyone had experience using Beam on DC/OS? I want to run Beam code
>> >
>> > executed with Spark runner on DC/OS. As a next step, I would like to run 
>> > the
>> >
>> > Flink runner as well. There doesn't seem to exist any information
>> > about running
>> >
>> > Beam on DC/OS I can find on the web. So some pointers are greatly
>> > appreciated.
>> >
>> > Thank you,
>> >
>> > Chaoran Yu
>> >
>> 
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org <mailto:jbono...@apache.org>
>> http://blog.nanthrax.net <http://blog.nanthrax.net/>
>> Talend - http://www.talend.com <http://www.talend.com/>
> 

Reply via email to