Re: Transform-specific thread pools in Python

2021-05-11 Thread Stephan Hoyer
On Mon, May 10, 2021 at 4:28 PM Ahmet Altay wrote: > > > On Mon, May 10, 2021 at 8:01 AM Stephan Hoyer wrote: > >> Hi Beam devs, >> >> I've been exploring recently how to optimize IO bound steps for my Python >> Beam pipelines, and have come up with a solution that I think might make >> sense

Beam Summit 2021 - Call for Proposals

2021-05-11 Thread Mara Ruvalcaba
Hi Beam Community! We are excited to announce Beam Summit 2021?:beam's-mascot-1::tada:  Beam Summit will happen from August 4th - 6th, 2021, and it will be held online. We want to hear from your experience with Beam!!! You are more than welcome to share with the community, send a proposal

Flaky test issue report

2021-05-11 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues because they have a major negative impact on the community and make it hard to determine the quality of the software. BEAM-12322: FnApiRunnerTestWithGrpcAndMultiWorkers flaky (py precommit)

P1 issues report

2021-05-11 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky tests. See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the meaning and expectations around P1 issues. BEAM-12324: TranslationsTest.test_run_packable_combine_* failing on

BeamSQL: Error when using WHERE statements with OVER windows

2021-05-11 Thread Burkay Gur
Hi folks, When we try to run the following query on BeamSQL: SELECT item, purchases, category, sum(purchases) over (PARTITION BY category ORDER BY purchases ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as total_purchases FROM PCOLLECTION WHERE purchases > 3 We are getting the following

Re: [EXT] Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-05-11 Thread Kenneth Knowles
+dev In the Beam Java ecosystem, this functionality is provided by the Sorter library (https://beam.apache.org/documentation/sdks/java-extensions/#sorter). I'm curious what people think about various options: - Python version of the transform(s) - Expose sorter as xlang transform(s) -

Re: Some questions around GroupIntoBatches

2021-05-11 Thread Reuven Lax
On Tue, May 11, 2021 at 9:01 AM Kenneth Knowles wrote: > > > On Mon, May 10, 2021 at 7:40 PM Reuven Lax wrote: > >> Hi, >> >> I've been looking at the implementation of GroupIntoBatches (hoping to >> add support to group based on byte size), and I have a few questions about >> the current

Re: Upgrading vendored gRPC from 1.26.0 to 1.36.0

2021-05-11 Thread Tomo Suzuki
Thank you for the advice. Yes, the latch not being counted-down is the problem. (my memo: https://github.com/apache/beam/pull/14474#discussion_r619557479 ) I'll need to figure out why withOnError is not called. > Can you repro locally? No, the task succeeds in my environment (./gradlew

Re: LGPL-2.1 in beam-vendor-grpc

2021-05-11 Thread Kenneth Knowles
+1 It seems we are pretty close on the upgrade. The same tricky problem as before, but it seems to be narrowed down. Kenn On Mon, May 10, 2021 at 8:26 AM Jean-Baptiste Onofre wrote: > +1 fully agree. > > Regards > JB > > Le 10 mai 2021 à 16:02, Jan Lukavský a écrit : > > +1 for blocking the

Re: Upgrading vendored gRPC from 1.26.0 to 1.36.0

2021-05-11 Thread Kenneth Knowles
I am not sure how much you read the code of the test. So apologies if I am saying things you already know. The test does something like: - start a logging service - set up some stub clients, each with onError wired up to release a countdown latch - send error responses to all three of them

Re: Ordered PCollections eventually?

2021-05-11 Thread Jan Lukavský
I'll just remind that Beam already supports (experimental) @RequiresTimeSortedInput (which has several limitations, mostly in that it orders only by timestamp and not some - time related - user field; and of course - missing retractions). An arbitrary sorting seems to be hard, even per-key, it

Re: Some questions around GroupIntoBatches

2021-05-11 Thread Kenneth Knowles
On Mon, May 10, 2021 at 7:40 PM Reuven Lax wrote: > Hi, > > I've been looking at the implementation of GroupIntoBatches (hoping to add > support to group based on byte size), and I have a few questions about the > current implementation. > > 1. I noticed that the transform does not preserve

Re: Ordered PCollections eventually?

2021-05-11 Thread Kenneth Knowles
Per-key ordered delivery makes a ton of sense. I'd guess CDC has the same needs as retractions, so that the changelog can be applied in order as it arrives. And since it is per-key you still get parallelism. Global ordering is quite different. I know that SQL and Dataframes have global sorting