Re: Beam Python streaming pipeline on Flink Runner

Robert Bradshaw Thu, 31 Jan 2019 06:54:27 -0800

And, as Max says, this is an SDF that wraps a BoundedSource or
UnboundedSource respectively. The other way around is not possible, as
SDF is strictly more powerful.


On Thu, Jan 31, 2019 at 3:52 PM Robert Bradshaw <rober...@google.com> wrote:
>
> On Thu, Jan 31, 2019 at 3:17 PM Ismaël Mejía <ieme...@gmail.com> wrote:
> >
> > > Not necessarily. This would be one way. Another way is build an SDF 
> > > wrapper for UnboundedSource. Probably the easier path for migration.
> >
> > That would be fantastic, I have heard about such wrapper multiple
> > times but so far there is not any realistic proposal. I have a hard
> > time to imagine how can we map in a generic way RestrictionTrackers
> > into the existing Bounded/UnboundedSource, so I would love to hear
> > more about the details.
>
> For BoundedSource the restriction is the BoundedSource object itself
> (which splits into multiple distinct bounded sources) with a tracker
> that forwards split calls to the reader, and the body of process would
> read from this reader to completion (never explicitly claiming
> positions from the reader).
>
> For unbounded sources, the restriction tracker always returns true on
> try_claim, false on try_split, and the process method returns current
> elements until either advance or try_claim returns false or try_split
> was called at 0, and the checkpoint mark is returned as the checkpoint
> residual.
>
> Initial splitting and sizing just forwards the calls.
>
> > On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels <m...@apache.org> wrote:
> > >
> > >  > In addition to have support in the runners, this will require a
> > >  > rewrite of PubsubIO to use the new SDF API.
> > >
> > > Not necessarily. This would be one way. Another way is build an SDF 
> > > wrapper for
> > > UnboundedSource. Probably the easier path for migration.
> > >
> > > On 31.01.19 14:03, Ismaël Mejía wrote:
> > > >> Fortunately, there is already a pending PR for cross-language 
> > > >> pipelines which
> > > >> will allow us to use Java IO like PubSub in Python jobs.
> > > >
> > > > In addition to have support in the runners, this will require a
> > > > rewrite of PubsubIO to use the new SDF API.
> > > >
> > > > On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels <m...@apache.org> 
> > > > wrote:
> > > >>
> > > >> Hi Matthias,
> > > >>
> > > >> This is already reflected in the compatibility matrix, if you look 
> > > >> under SDF.
> > > >> There is no UnboundedSource interface for portable pipelines. That's a 
> > > >> legacy
> > > >> abstraction that will be replaced with SDF.
> > > >>
> > > >> Fortunately, there is already a pending PR for cross-language 
> > > >> pipelines which
> > > >> will allow us to use Java IO like PubSub in Python jobs.
> > > >>
> > > >> Thanks,
> > > >> Max
> > > >>
> > > >> On 31.01.19 12:06, Matthias Baetens wrote:
> > > >>> Hey Ankur,
> > > >>>
> > > >>> Thanks for the swift reply. Should I change this in the capability 
> > > >>> matrix
> > > >>> <https://s.apache.org/apache-beam-portability-support-table> then?
> > > >>>
> > > >>> Many thanks.
> > > >>> Best,
> > > >>> Matthias
> > > >>>
> > > >>> On Thu, 31 Jan 2019 at 09:31, Ankur Goenka <goe...@google.com
> > > >>> <mailto:goe...@google.com>> wrote:
> > > >>>
> > > >>>      Hi Matthias,
> > > >>>
> > > >>>      Unfortunately, unbounded reads including pubsub are not yet 
> > > >>> supported for
> > > >>>      portable runners.
> > > >>>
> > > >>>      Thanks,
> > > >>>      Ankur
> > > >>>
> > > >>>      On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens 
> > > >>> <baetensmatth...@gmail.com
> > > >>>      <mailto:baetensmatth...@gmail.com>> wrote:
> > > >>>
> > > >>>          Hi everyone,
> > > >>>
> > > >>>          Last few days I have been trying to run a streaming pipeline 
> > > >>> (code on
> > > >>>          Github <https://github.com/matthiasa4/beam-demo>) on a Flink 
> > > >>> Runner.
> > > >>>
> > > >>>          I am running a Flink cluster locally (v1.5.6
> > > >>>          <https://flink.apache.org/downloads.html>)
> > > >>>          I have built the SDK Harness Container: /./gradlew
> > > >>>          :beam-sdks-python-container:docker/
> > > >>>          and started the JobServer: /./gradlew
> > > >>>          :beam-runners-flink_2.11-job-server:runShadow
> > > >>>          -PflinkMasterUrl=localhost:8081./
> > > >>>
> > > >>>          I run my pipeline with: /env/bin/python streaming_pipeline.py
> > > >>>          --runner=PortableRunner --job_endpoint=localhost:8099 
> > > >>> --output xxx
> > > >>>          --input_subscription xxx --output_subscription xxx/
> > > >>>          /
> > > >>>          /
> > > >>>          All this is running inside a Ubuntu (Bionic) in a Virtualbox.
> > > >>>
> > > >>>          The job submits fine, but unfortunately fails after a few 
> > > >>> seconds with
> > > >>>          the error attached.
> > > >>>
> > > >>>          Anything I am missing or doing wrong?
> > > >>>
> > > >>>          Many thanks.
> > > >>>          Best,
> > > >>>          Matthias
> > > >>>
> > > >>>

Re: Beam Python streaming pipeline on Flink Runner

Reply via email to