Re: "Decorator" pattern for PTramsforms

Alexey Romanenko Fri, 15 Sep 2023 07:29:06 -0700

I don’t think it’s possible to extend in a way that you are asking (like, Java 
classes “extend"). Though, you can create your own composite PTransform that 
will incorporate one or several others inside “expand()” method. Actually, most 
of the Beam native PTransforms are composite transforms. Please, take a look on 
doc and examples [1]

Regarding your example, please, be aware that all PTransforms are supposed to 
be executed in distributed environment and the order of records is not 
guaranteed. So, limiting the whole output by fixed number of records can be 
challenging - you’d need to make sure that it will be processed on only one 
worker, that means that you’d need to shuffle all your records by the same key 
and probably sort the records in way that you need.

Did you consider to use “org.apache.beam.sdk.transforms.Top” for that? [2]

If it doesn’t work for you, could you provide more details of your use case? 
Then we probably can propose the more suitable solutions for that.

[1] 
https://beam.apache.org/documentation/programming-guide/#composite-transforms
[2] 
https://beam.apache.org/releases/javadoc/2.50.0/org/apache/beam/sdk/transforms/Top.html

—
Alexey

> On 15 Sep 2023, at 14:22, Joey Tran <joey.t...@schrodinger.com> wrote:
> 
> Is there a way to extend already defined PTransforms? My question is probably 
> better illustrated with an example. Let's say I have a PTransform that 
> generates a very variable number of outputs. I'd like to "wrap" that 
> PTransform such that if it ever creates more than say 1,000 outputs, then I 
> just take the first 1,000 outputs without generating the rest of the outputs.
> 
> It'd be trivial if I have access to the DoFn, but what if the PTransform in 
> question doesn't expose the `DoFn`?

Re: "Decorator" pattern for PTramsforms

Reply via email to