Using the interfaces defined in pr/10992[1], I started migrating from
ProcessContext#updateWatermark to the WatermarkEstimators with pr/11126[2].
The PR is very WIP but it does include the necessary changes to the Watch
transform and also the UnboundedSource SDF wrapper to be able to report
waterm
The current set of watermark estimators in Apache Beam for UnboundedSource
are:
SQS - tracks the timestamp of the last unacked message (does not report
monotonically increasing watermarks and assumes that the system will make
sure to lower bound what is being reported)
AMP - tracks timestamp of las
Thanks for the explanation on Watch + FileIO it is really clear. Extra question
related to WatermarkEstimator, is it supposed to be called in pipelines at the
same exact moments that getWatermark is today for Unbounded sources?
(slightly unrelated) There is an open JIRA for an issue related to wat
On Wed, Mar 4, 2020 at 7:36 AM Ismaël Mejía wrote:
> > I think we should move to a world where *all* runners become portable
> runners.
> > The at doesn't mean they all need to user docker images, or even GRPC,
> but I
> > don't think having classical-only or classical-excluded features is
> wher
On Wed, Mar 4, 2020 at 7:37 AM Ismaël Mejía wrote:
> > Bounded SDFs are allowed to have a method signature which has void as the
> > return type OR a ProcessContinuation. Unbounded SDFs must use a
> > ProcessContinuation as the return type. The "void" return case improves
> ease
> > of use since
> I think we should move to a world where *all* runners become portable runners.
> The at doesn't mean they all need to user docker images, or even GRPC, but I
> don't think having classical-only or classical-excluded features is where we
> want to be long-term.
Robert I agree 100% with you, I dre
> Bounded SDFs are allowed to have a method signature which has void as the
> return type OR a ProcessContinuation. Unbounded SDFs must use a
> ProcessContinuation as the return type. The "void" return case improves ease
> of use since it is likely to be the common case for bounded SDFs.
Luke, th
On Tue, Mar 3, 2020 at 9:11 AM Ismaël Mejía wrote:
> > the unification of bounded/unbounded within SplittableDoFn has always
> been a goal.
>
> I am glad to know that my intuition is correct and that this was
> envisioned, the
> idea of checkpoints for bounded inputs sounds super really useful. E
On Tue, Mar 3, 2020 at 9:11 AM Ismaël Mejía wrote:
>
> > the unification of bounded/unbounded within SplittableDoFn has always been
> > a goal.
>
> I am glad to know that my intuition is correct and that this was envisioned,
> the
> idea of checkpoints for bounded inputs sounds super really usef
> the unification of bounded/unbounded within SplittableDoFn has always been a
> goal.
I am glad to know that my intuition is correct and that this was envisioned, the
idea of checkpoints for bounded inputs sounds super really useful. Eager to try
that on practice.
An explicit example (with a Wa
I don't have a strong preference for using a provider/having a set of
tightly coupled methods in Java, other than that we be consistent (and
we already use the methods style for restrictions).
On Mon, Mar 2, 2020 at 3:32 PM Luke Cwik wrote:
>
> Jan, there are some parts of Apache Beam the waterma
Jan, there are some parts of Apache Beam the watermarks package will likely
rely on (@Experimental annotation, javadoc links) but fundamentally should
not rely on core and someone could create a separate package for this.
Ismael, the unification of bounded/unbounded within SplittableDoFn has
alway
I just realized that the HBaseIO example is not a good one because we can
already have Watch like behavior as we do for Partition discovery in
HCatalogIO.
Still I am interested on your views on bounded/unbounded unification.
Interesting question2: How this will annotations connect with the Watch
t
Really interesting! Implementing correctly the watermark has been a common
struggle for IO authors, to the point that some IOs still have issues around
that. So +1 for this, in particular if we can get to reuse common patterns.
I was not aware of Boyuan's work around this, really nice.
One aspect
This is cool. Could the watermark estimators be packaged in a module
without additional dependencies? I think that it would be useful even
for projects outside of Bbeam, so it would be nice if these could use
this library without depending on Beam SDK itself.
Jan
On 2/28/20 12:50 AM, Luke Cwi
Python SDK also has a RestrictionProvider[1], that covers initial
splitting, sizing the restriction and providing the restriction coder.
I believe that keeping one as a provider while fully integrating the other
set as "new" DoFn style methods and parameters would be odd.
Kenn are you also suggest
Great idea.
Are any of the methods optional or useful on their own? It seems like maybe
not? So then a single annotation to return an object that returns all the
methods might be more clear. Per Boyuan's work - WatermarkEstimatorProvider?
Kenn
On Thu, Feb 27, 2020 at 2:43 PM Luke Cwik wrote:
>
See this doc[1] and blog[2] for some context about SplittableDoFns.
To support watermark reporting within the Java SDK for SplittableDoFns, we
need a way to have SDF authors to report watermark estimates over the
element and restriction pair that they are processing.
For UnboundedSources, it was
18 matches
Mail list logo