I'm aware of composite transforms and of the distributed nature of
PTransforms. I'm not suggesting limiting the entire set and my example was
more illustrative than the actual use case.

My actual use case is basically: I have multiple PTransforms, and let's say
most of them average ~100 generated outputs for a single input. Most of
these PTransforms will occasionally run into an input though that might
output maybe 1M outputs. This can cause issues if for example there are
transforms that follow it that require a lot of compute per input.

The simplest way to deal with this is to modify the `DoFn`s in our
Ptransforms and add a limiter in the logic (e.g. `if num_outputs_generated
>= OUTPUTS_PER_INPUT_LIMIT: return`). We could duplicate this logic across
our transforms, but it'd be much cleaner if we could lift up this limiting
logic out of the application logic and have some generic wrapper that
extends our transforms.

Thanks for the discussion!

On Fri, Sep 15, 2023 at 10:29 AM Alexey Romanenko <aromanenko....@gmail.com>
wrote:

> I don’t think it’s possible to extend in a way that you are asking (like,
> Java classes “*extend*"). Though, you can create your own composite
> PTransform that will incorporate one or several others inside *“expand()”*
> method. Actually, most of the Beam native PTransforms are composite
> transforms. Please, take a look on doc and examples [1]
>
> Regarding your example, please, be aware that all PTransforms are supposed
> to be executed in distributed environment and the order of records is not
> guaranteed. So, limiting the whole output by fixed number of records can be
> challenging - you’d need to make sure that it will be processed on only one
> worker, that means that you’d need to shuffle all your records by the same
> key and probably sort the records in way that you need.
>
> Did you consider to use “*org.apache.beam.sdk.transforms.Top*” for that?
> [2]
>
> If it doesn’t work for you, could you provide more details of your use
> case? Then we probably can propose the more suitable solutions for that.
>
> [1]
> https://beam.apache.org/documentation/programming-guide/#composite-transforms
> [2]
> https://beam.apache.org/releases/javadoc/2.50.0/org/apache/beam/sdk/transforms/Top.html
>
> —
> Alexey
>
> On 15 Sep 2023, at 14:22, Joey Tran <joey.t...@schrodinger.com> wrote:
>
> Is there a way to extend already defined PTransforms? My question is
> probably better illustrated with an example. Let's say I have a PTransform
> that generates a very variable number of outputs. I'd like to "wrap" that
> PTransform such that if it ever creates more than say 1,000 outputs, then I
> just take the first 1,000 outputs without generating the rest of the
> outputs.
>
> It'd be trivial if I have access to the DoFn, but what if the PTransform
> in question doesn't expose the `DoFn`?
>
>
>

Reply via email to