Re: Java SplittableDoFn Watermark API

2020-03-13 Thread Luke Cwik
Using the interfaces defined in pr/10992[1], I started migrating from ProcessContext#updateWatermark to the WatermarkEstimators with pr/11126[2]. The PR is very WIP but it does include the necessary changes to the Watch transform and also the UnboundedSource SDF wrapper to be able to report waterm

Re: Java SplittableDoFn Watermark API

2020-03-09 Thread Luke Cwik
The current set of watermark estimators in Apache Beam for UnboundedSource are: SQS - tracks the timestamp of the last unacked message (does not report monotonically increasing watermarks and assumes that the system will make sure to lower bound what is being reported) AMP - tracks timestamp of las

Re: Java SplittableDoFn Watermark API

2020-03-09 Thread Ismaël Mejía
Thanks for the explanation on Watch + FileIO it is really clear. Extra question related to WatermarkEstimator, is it supposed to be called in pipelines at the same exact moments that getWatermark is today for Unbounded sources? (slightly unrelated) There is an open JIRA for an issue related to wat

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Luke Cwik
On Wed, Mar 4, 2020 at 7:36 AM Ismaël Mejía wrote: > > I think we should move to a world where *all* runners become portable > runners. > > The at doesn't mean they all need to user docker images, or even GRPC, > but I > > don't think having classical-only or classical-excluded features is > wher

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Luke Cwik
On Wed, Mar 4, 2020 at 7:37 AM Ismaël Mejía wrote: > > Bounded SDFs are allowed to have a method signature which has void as the > > return type OR a ProcessContinuation. Unbounded SDFs must use a > > ProcessContinuation as the return type. The "void" return case improves > ease > > of use since

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Ismaël Mejía
> I think we should move to a world where *all* runners become portable runners. > The at doesn't mean they all need to user docker images, or even GRPC, but I > don't think having classical-only or classical-excluded features is where we > want to be long-term. Robert I agree 100% with you, I dre

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Ismaël Mejía
> Bounded SDFs are allowed to have a method signature which has void as the > return type OR a ProcessContinuation. Unbounded SDFs must use a > ProcessContinuation as the return type. The "void" return case improves ease > of use since it is likely to be the common case for bounded SDFs. Luke, th

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Luke Cwik
On Tue, Mar 3, 2020 at 9:11 AM Ismaël Mejía wrote: > > the unification of bounded/unbounded within SplittableDoFn has always > been a goal. > > I am glad to know that my intuition is correct and that this was > envisioned, the > idea of checkpoints for bounded inputs sounds super really useful. E

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Robert Bradshaw
On Tue, Mar 3, 2020 at 9:11 AM Ismaël Mejía wrote: > > > the unification of bounded/unbounded within SplittableDoFn has always been > > a goal. > > I am glad to know that my intuition is correct and that this was envisioned, > the > idea of checkpoints for bounded inputs sounds super really usef

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Ismaël Mejía
> the unification of bounded/unbounded within SplittableDoFn has always been a > goal. I am glad to know that my intuition is correct and that this was envisioned, the idea of checkpoints for bounded inputs sounds super really useful. Eager to try that on practice. An explicit example (with a Wa

Re: Java SplittableDoFn Watermark API

2020-03-02 Thread Robert Bradshaw
I don't have a strong preference for using a provider/having a set of tightly coupled methods in Java, other than that we be consistent (and we already use the methods style for restrictions). On Mon, Mar 2, 2020 at 3:32 PM Luke Cwik wrote: > > Jan, there are some parts of Apache Beam the waterma

Re: Java SplittableDoFn Watermark API

2020-03-02 Thread Luke Cwik
Jan, there are some parts of Apache Beam the watermarks package will likely rely on (@Experimental annotation, javadoc links) but fundamentally should not rely on core and someone could create a separate package for this. Ismael, the unification of bounded/unbounded within SplittableDoFn has alway

Re: Java SplittableDoFn Watermark API

2020-02-28 Thread Ismaël Mejía
I just realized that the HBaseIO example is not a good one because we can already have Watch like behavior as we do for Partition discovery in HCatalogIO. Still I am interested on your views on bounded/unbounded unification. Interesting question2: How this will annotations connect with the Watch t

Re: Java SplittableDoFn Watermark API

2020-02-28 Thread Ismaël Mejía
Really interesting! Implementing correctly the watermark has been a common struggle for IO authors, to the point that some IOs still have issues around that. So +1 for this, in particular if we can get to reuse common patterns. I was not aware of Boyuan's work around this, really nice. One aspect

Re: Java SplittableDoFn Watermark API

2020-02-28 Thread Jan Lukavský
This is cool. Could the watermark estimators be packaged in a module without additional dependencies? I think that it would be useful even for projects outside of Bbeam, so it would be nice if these could use this library without depending on Beam SDK itself. Jan On 2/28/20 12:50 AM, Luke Cwi

Re: Java SplittableDoFn Watermark API

2020-02-27 Thread Luke Cwik
Python SDK also has a RestrictionProvider[1], that covers initial splitting, sizing the restriction and providing the restriction coder. I believe that keeping one as a provider while fully integrating the other set as "new" DoFn style methods and parameters would be odd. Kenn are you also suggest

Re: Java SplittableDoFn Watermark API

2020-02-27 Thread Kenneth Knowles
Great idea. Are any of the methods optional or useful on their own? It seems like maybe not? So then a single annotation to return an object that returns all the methods might be more clear. Per Boyuan's work - WatermarkEstimatorProvider? Kenn On Thu, Feb 27, 2020 at 2:43 PM Luke Cwik wrote: >

Java SplittableDoFn Watermark API

2020-02-27 Thread Luke Cwik
See this doc[1] and blog[2] for some context about SplittableDoFns. To support watermark reporting within the Java SDK for SplittableDoFns, we need a way to have SDF authors to report watermark estimates over the element and restriction pair that they are processing. For UnboundedSources, it was