Re: [QUESTION] Why no auto labels?

2023-10-10 Thread Robert Bradshaw via dev
I would definitely support a PR making this an option. Changing the default would be a rather big change that would require more thought. On Tue, Oct 10, 2023 at 4:24 PM Joey Tran wrote: > Bump on this. Sorry to pester - I'm trying to get a few teams to adopt > Apache Beam at my company and I'm

Re: [QUESTION] Why no auto labels?

2023-10-10 Thread Joey Tran
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt Apache Beam at my company and I'm trying to foresee parts of the API they might find inconvenient. If there's a conclusion to make the behavior similar to java, I'm happy to put up a PR On Thu, Oct 5, 2023, 12:49 PM Joey Tran

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Tue, Oct 10, 2023 at 4:05 PM Chamikara Jayalath wrote: > > On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw > wrote: > >> On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath >> wrote: >> >>> >>> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote: >>> I suspect some simple pattern

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Chamikara Jayalath via dev
On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw wrote: > On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath > wrote: > >> >> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote: >> >>> I suspect some simple pattern templating would solve most use cases. We >>> probably would want to support

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Tue, Oct 10, 2023 at 4:03 PM Robert Bradshaw wrote: > On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote: > >> I suspect some simple pattern templating would solve most use cases. >> > > That's what I'm leaning towards as well. > > >> We probably would want to support timestamp formatting

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote: > I suspect some simple pattern templating would solve most use cases. > That's what I'm leaning towards as well. > We probably would want to support timestamp formatting (e.g. $ $M $D) > as well. > Although we have several timestamps to

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath wrote: > > On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote: > >> I suspect some simple pattern templating would solve most use cases. We >> probably would want to support timestamp formatting (e.g. $ $M $D) as >> well. >> >> On Tue, Oct

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Byron Ellis via dev
That's a good point--- in the dbt case they're almost always treating that as a precomputation. I suppose a JinjaTransform isn't totally insane, but not sure I'd want to introduce Yet Another Way Of Writing A Lambda :-) On Tue, Oct 10, 2023 at 3:22 PM Robert Bradshaw wrote: > On Tue, Oct 10,

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Chamikara Jayalath via dev
On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote: > I suspect some simple pattern templating would solve most use cases. We > probably would want to support timestamp formatting (e.g. $ $M $D) as > well. > > On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw > wrote: > >> On Mon, Oct 9, 2023

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Reuven Lax via dev
I suspect some simple pattern templating would solve most use cases. We probably would want to support timestamp formatting (e.g. $ $M $D) as well. On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw wrote: > On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath > wrote: > >> I would say: >> >>

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath wrote: > I would say: > > sink: > type: WriteToParquet > config: > path: /beam/filesytem/dest > prefix: > suffix: > > Underlying SDK will add the middle part of the file names to make sure > that files

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Tue, Oct 10, 2023 at 7:21 AM Kenneth Knowles wrote: > Another perspective: > > We should focus on the fact that FileIO writes what I would call a "big > file-based dataset" to a filesystem. The primary characteristic of a "big > file-based dataset" is that it is sharded and that the shards

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Robert Bradshaw via dev
On Tue, Oct 10, 2023 at 7:22 AM Byron Ellis via dev wrote: > FWIW dbt (which is also YAML and has this problem for other reasons) does > something like this. It also chooses to assume that everything is a string > but allows users to use the Jinja templating language to make those strings >

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-10 Thread Ahmet Altay via dev
Thank you for the information. I agree with Kenn in that case. This could wait for the next release. Unless there is another reason to do the RC2. On Tue, Oct 10, 2023 at 12:30 PM Yi Hu wrote: > > Would it impact all python users including breaking the new user, quick >> start experience? Or

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-10 Thread Yi Hu via dev
> Would it impact all python users including breaking the new user, quick > start experience? Or would it impact users of a specific IO or > configuration? > It is the latter. It will impact users of Specific IO (BigQueryIO read) specific configuration (Direct_Read). Note that the default

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-10 Thread Ahmet Altay via dev
Would it impact all python users including breaking the new user, quick start experience? Or would it impact users of a specific IO or configuration? If it is the former, I think it would be worth fixing it just to have a working new user experience. With new user experience I am thinking about

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-10 Thread Kenneth Knowles
After thinking this through a bit more, I am inclined to release RC1 with this noted as a known issue, unless there are other more compelling reasons to issues a second RC. Why? - It is more-or-less by design that end users of Beam Python have dependencies shift under them; breakage and

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
Top-post comment: Aggregation of test results is hard. We've had a million threads on it. You want to have a clear green "this runner works" signal but you also want completely isolated "this runner works for Java" / "this runner works for Python" / etc signals. The tension between these is

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Robert Burke
Conversely, by unifying to Gradle command names, it also teaches how folks can run these things locally. Doesn't help entirely with discoverability, or initial scrutability, but it feels lower impedance than someone needing to look at the action manually to learn what it's running under the hood

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Danny McCormick via dev
> Just to clarify: I'm not proposing tying them to gradle tasks (I'm fine with `go test` for example) or doing this in situations where it is unnatural. > My example probably confused this because I left off the `./gradlew` just to save space. I'm proposing naming them after their obvious repro

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Robert Burke
+1 to the general proposal. I'm not bothered if something says a gradle command and in execution, that task ends up running multiple different commands. Arguably, if we're running a gradle task manualy to prepare for a subsequent task that latter task should be adding the former to it's

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
On Tue, Oct 10, 2023 at 10:21 AM Danny McCormick via dev < dev@beam.apache.org> wrote: > I'm +1 on: > - standardizing our naming > - making job names match their commands exactly (though I'd still like the > `Run` prefix so that you can do things like say "Suite XYZ failed" without > triggering

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Byron Ellis via dev
FWIW dbt (which is also YAML and has this problem for other reasons) does something like this. It also chooses to assume that everything is a string but allows users to use the Jinja templating language to make those strings dynamic where needed. Syntactically I think that's a bit nicer to look at

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Danny McCormick via dev
I'm +1 on: - standardizing our naming - making job names match their commands exactly (though I'd still like the `Run` prefix so that you can do things like say "Suite XYZ failed" without triggering the automation) - removing pre/postcommit from the naming (we actually already run many precommits

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Kenneth Knowles
Another perspective: We should focus on the fact that FileIO writes what I would call a "big file-based dataset" to a filesystem. The primary characteristic of a "big file-based dataset" is that it is sharded and that the shards should not have any individual distinctiveness. The dataset should

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Yi Hu via dev
Thanks for raising this. This generally works, though some jobs run more than one gradle task (e.g. some IO_Direct_PreCommit run both :build (which executes unit tests) and :integrationTest). Another option is to normalize the naming of every job, saying the job name is X, then workflow name is

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Byron Ellis via dev
I'm +1 on standardizing the names and while I don't have a strong opinion on which standard (so long as it's only one) using the Gradle name seems like a perfectly good choice... I don't know the GHA setup well enough, but would that help maintain those? Presumably the various actions all

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Kenneth Knowles
Since I've been in GHA files lately... I think they have a very useful pattern which we could borrow from or learn from, where setting up the variables happens separately, like

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
FWIW I aware of the README in https://github.com/apache/beam/tree/master/.test-infra/jenkins that lists the phrases alongside the jobs. This is just wasted work to maintain IMO. Kenn On Tue, Oct 10, 2023 at 9:46 AM Kenneth Knowles wrote: > *Proposal:* make all the job names exactly match the

[PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
*Proposal:* make all the job names exactly match the GH comment to run them and make it also as close as possible to how to reproduce locally *Example problems*: - We have really silly redundant jobs results like 'Chicago Taxi Example on Dataflow ("Run Chicago Taxi on Dataflow")' and

Beam High Priority Issue Report (43)

2023-10-10 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis