[PROPOSAL] Re-enable checkerframework by default

2022-10-19 Thread Kenneth Knowles
Hi all, Some time ago we turned off checker framework locally by default, and only turn it on with `-PenableCheckerFramework` and also on Jenkins. My opinion is that this causes more headache than it solves, by delaying finding out about errors. The increased compilation time of checkerframework

Re: Go, Java, & Python Project Starter / Example Using Terraform to build Dataflow Custom Templates

2022-10-19 Thread Damon Douglas via dev
Thank you so much Robert for pointing that out! I submitted a quick patch PR to correct this. On Wed, Oct 19, 2022 at 9:29 AM Robert Burke wrote: > Woohoo! Thanks Damon! This will be handy for Beam Go users on Dataflow. > > I have one note: It looks like the go.mod is requiring go 1.20, which >

Re: Go, Java, & Python Project Starter / Example Using Terraform to build Dataflow Custom Templates

2022-10-19 Thread Robert Burke
Woohoo! Thanks Damon! This will be handy for Beam Go users on Dataflow. I have one note: It looks like the go.mod is requiring go 1.20, which doesn't yet exist: https://github.com/GoogleCloudPlatform/professional-services/blob/main/examples/dataflow-custom-templates/go/go.mod#L3 The latest versio

Re: [DISCUSS] Jenkins -> GitHub Actions ?

2022-10-19 Thread Danny McCormick via dev
Thanks for kicking this conversation off. I'm +1 on migrating, but only once we've found a specific replacement for easy observability (which workflows have been failing lately, and how often) and trigger phrases (for retries and workflows that aren't automatically kicked off but should be run for

[DISCUSS] Jenkins -> GitHub Actions ?

2022-10-19 Thread Kenneth Knowles
Hi all, As you probably noticed, there's a lot of work going on around adding more GitHub Actions workflows. Can we fully migrate to GitHub Actions? Similar to our GitHub Issues migration (but less user-facing) it would bring us on to "default" infrastructure that more people understand and is ma

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Reuven Lax via dev
PCollections's usually are persistent within a pipeline, so you can reuse them in other parts of a pipeline with no problem. There is no notion of state across pipelines - every pipeline is independent. If you want state across pipelines you can write the PCollection out to a set of files which ar

Beam High Priority Issue Report (49)

2022-10-19 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/23709 [Flake]: Spark batc

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Ravi Kapoor
On Wed, Oct 19, 2022 at 2:43 PM Ravi Kapoor wrote: > I am talking about in batch context. Can we do checkpointing in batch mode > as well? > I am *not* looking for any failure or retry algorithm. > The requirement is to simply materialize a PCollection which can be used > across the jobs /within

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Ravi Kapoor
I am talking about in batch context. Can we do checkpointing in batch mode as well? I am looking for any failure or retry algorithm. The requirement is to simply materialize a PCollection which can be used across the jobs /within the job in some view/temp table which is auto deleted I believe Res

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Israel Herraiz via dev
I think that would be a Reshuffle , but only within the context of the same job (e.g. if there is a failure and a retry, the retry would start from the checkpoint created by the reshuffle). In Dataflow,