Re: Question about integration tests for the examples classes

2022-03-22 Thread Fer Morales Martinez
Thanks for the reply Kyle! I will skip streaming tests for the Spark runner for now. After *disabling* the *GameStatsIT* test, these are the results. At first sight it looks like all three runners successfully execute some or all tests, but upon closer inspection I realized those were just false

Flaky test issue report (51)

2022-03-22 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake) These are P1 issues because they have a major negative impact on the community and make it hard to determin

P1 issues report (73)

2022-03-22 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake). See https://beam.apache.

Re: Question about integration tests for the examples classes

2022-03-22 Thread Fer Morales Martinez
Totally! Flink https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Flink/8/testReport/org.apache.beam.examples.complete.game/ Direct https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Direct/10/testReport/org.apache.beam.examples.complete.game/ Dataflow https://ci-beam.apac

Re: Question about integration tests for the examples classes

2022-03-22 Thread Fer Morales Martinez
For some tests [1], [2] I see something along the lines of: INFO: Job 2022-03-17_09_56_46-14482025605245680862 finished with status DONE. Mar 17, 2022 5:02:44 PM org.apache.beam.runners.dataflow.TestDataflowRunner checkForPAssertSuccess INFO: Success result for Dataflow job 2022-03-17_09_56_46-144

Re: CoGbkResult size sampling performance issues

2022-03-22 Thread Steve Niemitz
Oh that's interesting I didn't know it even had that optimization. I wonder if implementing `ElementByteSizeObservableIterable` in TagIterable would be the solution then? You're right this does seem very brittle though. It seems like GroupByKey does the "right" thing, WindowReiterable extends th

Re: CoGbkResult size sampling performance issues

2022-03-22 Thread Steve Niemitz
Actually, I'm confused, in that example I linked, isn't it missing the part that hooks up the element observable to the returned iterator? ElementByteSizeObservableIterable does it in the implementation of iterator() [1], but WindowReiterable overrides iterator. [1] https://github.com/apache/beam

Re: CoGbkResult size sampling performance issues

2022-03-22 Thread Steve Niemitz
I can submit a PR that hooks it up incorrectly(?) similar to how WindowReiterable works [1]. That will at least fix the performance issue, with the expense of making the estimated size wrong in the Dataflow UI. The correct implementation seems like it'd be more complicated, since you'd need to plu

Re: Question about integration tests for the examples classes

2022-03-22 Thread Fer Morales Martinez
Great! I will change the log retention policy just to double check they are passing. Thanks again, Kyle! On Tue, Mar 22, 2022 at 12:51 PM Kyle Weaver wrote: > Those log messages are only printed by the Dataflow runner, so we > shouldn't expect to see them printed by other runners. > https://git

Re: Possible 2.36.0 regression with CoGbkResult

2022-03-22 Thread Steve Niemitz
Your email was actually what made me notice this! :D I haven't been able to reproduce the NPE you found (also on 2.37) but that certainly doesn't mean it's not a bug. On Tue, Mar 22, 2022 at 5:23 PM Niel Markwick wrote: > I have also seen this with Java beam 2.36.0 and 2.37.0, again with larg

Re: Possible 2.36.0 regression with CoGbkResult

2022-03-22 Thread Steve Niemitz
oh I just realized I responded to a different thread, feel free to ignore me. On Tue, Mar 22, 2022 at 5:34 PM Niel Markwick wrote: > yeah there does seem to be some heisenbug attributes... > It failed on 4 out of 6 runs with this reproducer, and always succeeded > with DirectRunner or 2.35.0...

Re: Deprecating ProcessContext in Java

2022-03-22 Thread Kenneth Knowles
Another aspect to what Reuven brought up is that it is quite difficult to dynamically create a DoFn now that it requires annotations or magic parameters for everything. For example if I have my own DSL and I want to compile it to a DoFn where the state / parameters of the DoFn depend on the input b

Re: Possible 2.36.0 regression with CoGbkResult

2022-03-22 Thread Claire McGinty
I was wondering if there’s any difference in how Dataflow v1 vs v2 loads CoGbkResult iterables - in one of our internal pipielines that started failing, I added some logging to CoGbkResult and found that the TagIterable’s iterator would return hasNext() == true, but next() == null. - Claire On Tu

Re: Possible 2.36.0 regression with CoGbkResult

2022-03-22 Thread Claire McGinty
(Forgot to add, this was with the transform coder wrapped with NullableCoder so it didn’t fail outright.) - Claire On Tue, Mar 22, 2022 at 11:26 PM Claire McGinty wrote: > I was wondering if there’s any difference in how Dataflow v1 vs v2 loads > CoGbkResult iterables - in one of our internal p