I'll take another look at the PR. My inclination is still to use uuids
to uniquify. I think that's worth the cost to the readability hit (I'm
OK reducing this down to 6-8 hex digits which will still give very low
chances of collisions, though it doesn't solve the first one). If
someone cares about
Just want to bump this discussion again. I'm introducing Beam to other
developers at my Schrodinger now and the first (of hopefully many!)
developer has started migrating our internal workflows to Beam. As I
suspected though, he's complained about the iteration cycles spent from
using the same
On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote:
> On Fri, Oct 13, 2023 at 10:08 AM Joey Tran
> wrote:
>
Are there places on the SDK side that expect unique labels? Or in
>> non-updateable runners?
>>
>
> That's a good question. The label eventually ends up here:
>
On Fri, Oct 13, 2023 at 10:08 AM Joey Tran
wrote:
> That makes sense. Would you suggest the new option simply suppress the
> RuntimeError and use the non-unique label?
>
Yes. (Or, rather, not raise it.)
> Are there places on the SDK side that expect unique labels? Or in
> non-updateable
That makes sense. Would you suggest the new option simply suppress the
RuntimeError and use the non-unique label? Are there places on the SDK side
that expect unique labels? Or in non-updateable runners?
Would making `--auto_unique_labels` a mutually exclusive option with
streaming be a
Thanks for the PR.
I think we should follow Java and allow non-unique labels, but not provide
automatic uniquification, In particular, the danger of using a counter is
that one can get accidental (and potentially hard to check) off-by-one
collisions. As a concrete example, imagine one partitions
For posterity: https://github.com/apache/beam/pull/28984
On Tue, Oct 10, 2023 at 7:29 PM Robert Bradshaw wrote:
> I would definitely support a PR making this an option. Changing the
> default would be a rather big change that would require more thought.
>
> On Tue, Oct 10, 2023 at 4:24 PM Joey
I would definitely support a PR making this an option. Changing the default
would be a rather big change that would require more thought.
On Tue, Oct 10, 2023 at 4:24 PM Joey Tran wrote:
> Bump on this. Sorry to pester - I'm trying to get a few teams to adopt
> Apache Beam at my company and I'm
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt
Apache Beam at my company and I'm trying to foresee parts of the API they
might find inconvenient.
If there's a conclusion to make the behavior similar to java, I'm happy to
put up a PR
On Thu, Oct 5, 2023, 12:49 PM Joey Tran
Is it really toggleable in Java? I imagine that if it's a toggle it'd be a
very sticky toggle since it'd be easy for PTransforms to accidentally rely
on it.
On Thu, Oct 5, 2023 at 12:33 PM Robert Bradshaw wrote:
> Huh. This used to be a hard error in Java, but I guess it's togglable
> with an
Huh. This used to be a hard error in Java, but I guess it's togglable
with an option now. We should probably add the option to toggle Python too.
(Unclear what the default should be, but this probably ties into
re-thinking how pipeline update should work.)
On Thu, Oct 5, 2023 at 4:58 AM Joey Tran
Makes sense that the requirement is the same, but is the label
auto-generation behavior the same? I modified the BeamJava
wordcount example[1] to do the regex filter twice in a row, and unlike the
BeamPython example I posted before, it just warns instead of throwing an
exception.
Tangentially, is
This feels like something that maybe should be more explicit? Overloading
the transform name to provide a unique stable id feels like perhaps too
much magic... also maybe feels like this is leaking specific runner
behavior? I get that it's convenient
On Wed, Oct 4, 2023 at 9:16 AM Robert Bradshaw
BeamJava and BeamPython have the exact same behavior: transform names
within must be distinct [1]. This is because we do not necessarily know at
pipeline construction time if the pipeline will be streaming or batch, or
if it will be updated in the future, so the decision was made to impose
this
Cross posting this thread to dev@ to see if this is intentional behavior or
if it's something worth changing for the python SDK
On Tue, Oct 3, 2023, 10:10 PM XQ Hu via user wrote:
> That suggests the default label is created as that, which indeed causes
> the duplication error.
>
> On Tue, Oct
15 matches
Mail list logo