Re: [QUESTION] Why no auto labels?

2023-10-20 Thread Robert Bradshaw via dev
I'll take another look at the PR. My inclination is still to use uuids to uniquify. I think that's worth the cost to the readability hit (I'm OK reducing this down to 6-8 hex digits which will still give very low chances of collisions, though it doesn't solve the first one). If someone cares about

Re: [QUESTION] Why no auto labels?

2023-10-20 Thread Joey Tran
Just want to bump this discussion again. I'm introducing Beam to other developers at my Schrodinger now and the first (of hopefully many!) developer has started migrating our internal workflows to Beam. As I suspected though, he's complained about the iteration cycles spent from using the same

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote: > On Fri, Oct 13, 2023 at 10:08 AM Joey Tran > wrote: > Are there places on the SDK side that expect unique labels? Or in >> non-updateable runners? >> > > That's a good question. The label eventually ends up here: >

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Robert Bradshaw via dev
On Fri, Oct 13, 2023 at 10:08 AM Joey Tran wrote: > That makes sense. Would you suggest the new option simply suppress the > RuntimeError and use the non-unique label? > Yes. (Or, rather, not raise it.) > Are there places on the SDK side that expect unique labels? Or in > non-updateable

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
That makes sense. Would you suggest the new option simply suppress the RuntimeError and use the non-unique label? Are there places on the SDK side that expect unique labels? Or in non-updateable runners? Would making `--auto_unique_labels` a mutually exclusive option with streaming be a

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Robert Bradshaw via dev
Thanks for the PR. I think we should follow Java and allow non-unique labels, but not provide automatic uniquification, In particular, the danger of using a counter is that one can get accidental (and potentially hard to check) off-by-one collisions. As a concrete example, imagine one partitions

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
For posterity: https://github.com/apache/beam/pull/28984 On Tue, Oct 10, 2023 at 7:29 PM Robert Bradshaw wrote: > I would definitely support a PR making this an option. Changing the > default would be a rather big change that would require more thought. > > On Tue, Oct 10, 2023 at 4:24 PM Joey

Re: [QUESTION] Why no auto labels?

2023-10-10 Thread Robert Bradshaw via dev
I would definitely support a PR making this an option. Changing the default would be a rather big change that would require more thought. On Tue, Oct 10, 2023 at 4:24 PM Joey Tran wrote: > Bump on this. Sorry to pester - I'm trying to get a few teams to adopt > Apache Beam at my company and I'm

Re: [QUESTION] Why no auto labels?

2023-10-10 Thread Joey Tran
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt Apache Beam at my company and I'm trying to foresee parts of the API they might find inconvenient. If there's a conclusion to make the behavior similar to java, I'm happy to put up a PR On Thu, Oct 5, 2023, 12:49 PM Joey Tran

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Joey Tran
Is it really toggleable in Java? I imagine that if it's a toggle it'd be a very sticky toggle since it'd be easy for PTransforms to accidentally rely on it. On Thu, Oct 5, 2023 at 12:33 PM Robert Bradshaw wrote: > Huh. This used to be a hard error in Java, but I guess it's togglable > with an

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Robert Bradshaw via dev
Huh. This used to be a hard error in Java, but I guess it's togglable with an option now. We should probably add the option to toggle Python too. (Unclear what the default should be, but this probably ties into re-thinking how pipeline update should work.) On Thu, Oct 5, 2023 at 4:58 AM Joey Tran

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Joey Tran
Makes sense that the requirement is the same, but is the label auto-generation behavior the same? I modified the BeamJava wordcount example[1] to do the regex filter twice in a row, and unlike the BeamPython example I posted before, it just warns instead of throwing an exception. Tangentially, is

Re: [QUESTION] Why no auto labels?

2023-10-04 Thread Byron Ellis via dev
This feels like something that maybe should be more explicit? Overloading the transform name to provide a unique stable id feels like perhaps too much magic... also maybe feels like this is leaking specific runner behavior? I get that it's convenient On Wed, Oct 4, 2023 at 9:16 AM Robert Bradshaw

Re: [QUESTION] Why no auto labels?

2023-10-04 Thread Robert Bradshaw via dev
BeamJava and BeamPython have the exact same behavior: transform names within must be distinct [1]. This is because we do not necessarily know at pipeline construction time if the pipeline will be streaming or batch, or if it will be updated in the future, so the decision was made to impose this

Re: [QUESTION] Why no auto labels?

2023-10-04 Thread Joey Tran
Cross posting this thread to dev@ to see if this is intentional behavior or if it's something worth changing for the python SDK On Tue, Oct 3, 2023, 10:10 PM XQ Hu via user wrote: > That suggests the default label is created as that, which indeed causes > the duplication error. > > On Tue, Oct