Re: [VOTE] Release 2.20.0, release candidate #2

2020-04-11 Thread Aaron Dixon
+1 - Ran full Clojure ports of all mobile gaming demos ([1],[2]): user score, hourly team score, leaderboard, game stats, and stateful team score using thurber[3] ...using 2.20.0 + JDK8 + Dataflow - Ran thurber test suite (DirectRunner, org.apache.beam.sdk.testing) which exercises

Re: Beam's Avro 1.8.x dependency

2020-01-17 Thread Aaron Dixon
@Tomo Suzuki Thanks for looking at this and your thorough analysis// let me know if I can help with any regression testing, CRs, or more context anything else. --- @Elliotte I was a little confused at first re: vendoring vs shading but here is how I understand it (happy to be corrected if

Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Aaron Dixon
thub.com/apache/beam/blob/master/vendor/README.md > >>> > > >>> > > >>> > Tomo Suzuki 于2020年1月16日周四 下午1:18写道: > >>> >> > >>> >> I've been upgrading dependencies around gRPC. This Avro-problem is > >>

Re: Beam's Avro 1.8.x dependency

2020-01-15 Thread Aaron Dixon
I meant to mention that we must use Avro 1.9.x as we rely on some schema resolution fixes not present in 1.8.x - so am indeed blocked. On Wed, Jan 15, 2020 at 8:50 PM Aaron Dixon wrote: > It looks like Avro version dependency from Beam has come up in the past > [1, 2]. > > I'm curre

Beam's Avro 1.8.x dependency

2020-01-15 Thread Aaron Dixon
It looks like Avro version dependency from Beam has come up in the past [1, 2]. I'm currently on Beam 2.16.0, which has been compatible with my usage of Avro 1.9.x. But upgrading to Beam 2.17.0 is not possible for us now that 2.17.0 has some dependencies on Avro classes only available in 1.8.x.

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
terintuitive in > accumulating mode, where the empty pane is not the identity element. > > Kenn > > *I don't recall if this is the default or not, and also because on phone > it is slow to look up. From your experience I think not default. > > On Mon, Jan 13, 2020, 15:03 Aa

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
are "optional"... ? On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon wrote: > Yes. Using calendar day-based windows and watermark is completely caught > up to today ... calendar window ends several days ago. I got EARLY panes > for each element but never ON_TIME pane. > > On

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
dow? > > On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon wrote: > >> The window is not empty fwiw; it has elements; I get an early firing pane >> for the window but well after the watermark passes there is no ON_TIME >> pane. Would this be a bug in Dataflow? Seems fundamenta

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
, 2020 at 3:58 PM Luke Cwik wrote: > I would have expected an empty on time pane since the default on time > behavior is FIRE_ALWAYS. > > On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon wrote: > >> Can anyone confirm? >> >> This is intermittent. Some (it seems, sparse

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
rantee to fire. > > > > -Rui > > On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon wrote: > >> I have the following trigger: >> >> .apply(Window >> .configure() >> .triggering(AfterWatermark >>.pastEndOfWindow(

No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
I have the following trigger: .apply(Window .configure() .triggering(AfterWatermark .pastEndOfWindow() .withEarlyFirings(AfterPane .elementCountAtLeast(1))) .accumulatingFiredPanes() .withAllowedLateness(Duration.ZERO) But in Dataflow

Re: outputWithTimestamp

2020-01-13 Thread Aaron Dixon
osing some > of these (if the skew ever happens to be larger than the static value you > set, you'll not be about to output the session). However now you know > you're limiting the skewed output to only those specific long sessions > you've chosen, which is much better than emitting all

Re: outputWithTimestamp

2020-01-12 Thread Aaron Dixon
ing properly, and might cause data to be dropped. Have > you considered instead using either triggers or timers to trigger your > aggregations? That way you don't need to wait for the watermark to advance > to the end of the window to trigger the aggregation, but the end-of-window &g

Re: outputWithTimestamp

2020-01-12 Thread Aaron Dixon
think this use case is interesting because it ...seems... to be a rather simple/distilled justification for being able to output data behind the watermark mid-stream.) [1] https://beam.apache.org/blog/2016/10/20/test-stream.html On Sat, Jan 11, 2020 at 10:10 PM Aaron Dixon wrote: > Oh nice—that will be

Re: outputWithTimestamp

2020-01-11 Thread Aaron Dixon
ou set a > timer using withOutputTimetstamp(t), the watermark will be held to t. > > On Sat, Jan 11, 2020 at 4:15 PM Aaron Dixon wrote: > >> Hi Reuven thanks for your quick reply >> >> I've tried that but the drag it puts on the watermark was too intrusive. >> For ex

Re: outputWithTimestamp

2020-01-11 Thread Aaron Dixon
ent), so you can .call outputWithTimestamp using > the CLICK GREEN timestamp without needing to set the allowed-lateness skew. > > Reuven > > On Sat, Jan 11, 2020 at 1:50 PM Aaron Dixon wrote: > >> I've just built a pipeline in Beam and after exploring several options >&

outputWithTimestamp

2020-01-11 Thread Aaron Dixon
I've just built a pipeline in Beam and after exploring several options for my use case, I've ended up relying on the deprecated .outputWithTimestamp() + DoFn#getAllowedTimestampSkew in what seems to me a quite valid use case. So I suppose this is a vote for un-deprecating this API (or a teachable

Re: Custom window invariants and

2020-01-11 Thread Aaron Dixon
interested in seeing how the windowing merge semantics get nailed down / evolve, and to explore what kind of innovative stuff could be ultimately done with them.. On Fri, Jan 10, 2020 at 4:38 PM Aaron Dixon wrote: > Once again this is a great help, thank you Kenneth > > On Wed, Jan 8, 2020

Re: Custom window invariants and

2020-01-10 Thread Aaron Dixon
bug or a mismatch based on the assumptions. Note that this code/logic is > shared by all runners. I do think you can write a WindowFn that induces it. > > Kenn > > *this was intended to be a performance optimization, but eagerly copying > the data turned out faster so now it is

Re: Custom window invariants and

2020-01-07 Thread Aaron Dixon
ate feature on this pipeline? Also, do > you have the code for your WindowFn? > > On Tue, Jan 7, 2020 at 12:05 PM Aaron Dixon wrote: > >> Dataflow. (See stacktrace) >> >> On Tue, Jan 7, 2020 at 1:50 PM Reuven Lax wrote: >> >>> Which runner are you using? >

Re: Custom window invariants and

2020-01-07 Thread Aaron Dixon
Dataflow. (See stacktrace) On Tue, Jan 7, 2020 at 1:50 PM Reuven Lax wrote: > Which runner are you using? > > On Tue, Jan 7, 2020, 11:17 AM Aaron Dixon wrote: > >> I get an IllegalStateException " is in more than one state >> address window set" (stacktrace

Custom window invariants and

2020-01-07 Thread Aaron Dixon
I get an IllegalStateException " is in more than one state address window set" (stacktrace below). What does this mean? What invariant of custom window implementation & merging am I violating? Thank you for any advise. ``` java.lang.IllegalStateException:

Re: session window puzzle

2019-12-08 Thread Aaron Dixon
have?) On Sat, Dec 7, 2019 at 10:54 PM Aaron Dixon wrote: > > In your proposed solution, it probably could be expressed as a new > merging WindowFn. You would assign each Green element to two tagged windows > that were GreenFromOrange and GreenToBlue type, and have a se

Re: session window puzzle

2019-12-07 Thread Aaron Dixon
eth Knowles wrote: > On Wed, Nov 13, 2019 at 7:39 PM Aaron Dixon wrote: > >> This is a great help. Thank you. I like the custom window solution >> pattern as a way to hold the watermark and merge down to keep the watermark >> where it is needed. Perhaps there is some inte

Re: real real-time beam

2019-12-06 Thread Aaron Dixon
/org/apache/beam/sdk/transforms/DoFn.java#L557 >>> >>> >>> On Mon, Nov 25, 2019 at 4:06 PM Pablo Estrada >>> wrote: >>> >>>> The blog posts on stateful and timely computation with Beam should help >>>> clarify a lot about how to use state and timers t

Re: real real-time beam

2019-11-25 Thread Aaron Dixon
; > b) use a buffer to buffer elements inside the outputting ParDo and > > only output them when watermark passes (using a timer). > > > > There is actually an ongoing discussion about how to make option b) > > user-friendly and part of Beam itself, but currently there

real real-time beam

2019-11-25 Thread Aaron Dixon
Suppose I trigger a Combine per-element (in a high-volume stream) and use a ParDo as a sink. I assume there is no guarantee about the order that my ParDo will see these triggers, especially as it processes in parallel, anyway. That said, my sink writes to a db or cache and I would not like the

Re: session window puzzle

2019-11-13 Thread Aaron Dixon
This should be doable with one. > > *SessionWindow plus EARLIEST holding up the watermark/pipeline was an > early complaint. That is part of why we switched the default to > end-of-window (also it is easier to understand and more efficient to > compute) > > Kenn > > On

Re: Custom Windowing and TimestampCombiner

2019-11-05 Thread Aaron Dixon
, 2019 at 10:55 AM Reuven Lax wrote: > > > On Tue, Nov 5, 2019 at 8:07 AM Aaron Dixon wrote: > >> I noticed that if I use TimestampCombiner/EARLIEST for session windows >> that the watermark appears to get held up for sessions that never "close" >> (or that

Custom Windowing and TimestampCombiner

2019-11-05 Thread Aaron Dixon
I noticed that if I use TimestampCombiner/EARLIEST for session windows that the watermark appears to get held up for sessions that never "close" (or that extend for a long time). But if I use default (TimestampCombiner/END_OF_WINDOW) the watermark doesn't get held. Does this mean that the

Re: aggregating over triggered results

2019-10-31 Thread Aaron Dixon
igger 'into' downstream aggregations, why is the API so friendly to doing just this? On Wed, Oct 30, 2019 at 5:37 PM Robert Bradshaw wrote: > On Tue, Oct 29, 2019 at 7:01 PM Aaron Dixon wrote: > > > > Thank you, Luke and Robert. Sorry for hitting dev@, I criss-crossed and > meant

Re: aggregating over triggered results

2019-10-29 Thread Aaron Dixon
alue once even if its in many windows. If that > > doesn't perform well, then come back to dev@ and look to optimize. > > > > On Tue, Oct 29, 2019 at 1:22 PM Aaron Dixon wrote: > >> > >> Hi I am new to Beam. > >> > >> I would like to accumulat

aggregating over triggered results

2019-10-29 Thread Aaron Dixon
Hi I am new to Beam. I would like to accumulate data over 30 day period and perform a running aggregation over this data, say every 10 minutes. I could use a sliding window of 30 days every 10 minutes (triggering at end of window) but this seems grossly inefficient (both in terms of # of windows