Looks like this is the current behaviour. If you have `t =
beam.Filter(identity_filter)`, `t.label` is defined as
`Filter(identity_filter)`.

On Mon, Oct 2, 2023 at 9:25 AM Joey Tran <joey.t...@schrodinger.com> wrote:

> You don't have to specify the names if the callable you pass in is
> /different/ for two `beam.Map`s, but  if the callable is the same you must
> specify a label. For example, the below will raise an exception:
>
> ```
>         | beam.Filter(identity_filter)
>         | beam.Filter(identity_filter)
> ```
>
> Here's an example on playground that shows the error message you get [1].
> I marked every line I added with a "# ++".
>
> It's a contrived example, but using a map or filter at the same pipeline
> level probably comes up often, at least in my inexperience. For example,
> you. might have a pipeline that partitions a pcoll into three different
> pcolls, runs some transforms on them, and then runs the same type of filter
> on each of them.
>
> The case that happens most often for me is using the `assert_that` [2]
> testing transform. In this case, I think often users will really have no
> need for a disambiguating label as they're often just writing unit tests
> that test a few different properties of their workflow.
>
> [1] https://play.beam.apache.org/?sdk=python&shared=hIrm7jvCamW
> [2]
> https://beam.apache.org/releases/pydoc/2.29.0/apache_beam.testing.util.html#apache_beam.testing.util.assert_that
>
> On Mon, Oct 2, 2023 at 9:08 AM Bruno Volpato via user <
> user@beam.apache.org> wrote:
>
>> If I understand the question correctly, you don't have to specify those
>> names.
>>
>> As Reuven pointed out, it is probably a good idea so you have a stable /
>> deterministic graph.
>> But in the Python SDK, you can simply use pcollection | map_fn, instead
>> of pcollection | 'Map' >> map_fn.
>>
>> See an example here
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/group_with_coder.py#L100-L116
>>
>>
>> On Sun, Oct 1, 2023 at 9:08 PM Joey Tran <joey.t...@schrodinger.com>
>> wrote:
>>
>>> Hmm, I'm not sure what you mean by "updating pipelines in place". Can
>>> you elaborate?
>>>
>>> I forgot to mention my question is posed from the context of a python
>>> SDK user, and afaict, there doesn't seem to be an obvious way to
>>> autogenerate names/labels. Hearing that the java SDK supports it makes me
>>> wonder if the python SDK could support it as well though... (If so, I'd be
>>> happy to do implement it). Currently, it's fairly tedious to have to name
>>> every instance of a transform that you might reuse in a pipeline, e.g. when
>>> reapplying the same Map on different pcollections.
>>>
>>> On Sun, Oct 1, 2023 at 8:12 PM Reuven Lax via user <user@beam.apache.org>
>>> wrote:
>>>
>>>> Are you talking about transform names? The main reason was because for
>>>> runners that support updating pipelines in place, the only way to do so
>>>> safely is if the runner can perfectly identify which transforms in the new
>>>> graph match the ones in the old graph. There's no good way to auto generate
>>>> names that will stay stable across updates - even small changes to the
>>>> pipeline might change the order of nodes in the graph, which could result
>>>> in a corrupted update.
>>>>
>>>> However, if you don't care about update, Beam can auto generate these
>>>> names for you! When you call PCollection.apply (if using BeamJava), simply
>>>> omit the name parameter and Beam will auto generate a unique name for you.
>>>>
>>>> Reuven
>>>>
>>>> On Sat, Sep 30, 2023 at 11:54 AM Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> After writing a few pipelines now, I keep getting tripped up from
>>>>> accidentally have duplicate labels from using multiple of the same
>>>>> transforms without labeling them. I figure this must be a common 
>>>>> complaint,
>>>>> so I was just curious, what the rationale behind this design was? My naive
>>>>> thought off the top of my head is that it'd be more user friendly to just
>>>>> auto increment duplicate transforms, but I figure I must be missing
>>>>> something
>>>>>
>>>>> Cheers,
>>>>> Joey
>>>>>
>>>>

Reply via email to