I don’t think it’s possible to extend in a way that you are asking (like, Java classes “extend"). Though, you can create your own composite PTransform that will incorporate one or several others inside “expand()” method. Actually, most of the Beam native PTransforms are composite transforms. Please, take a look on doc and examples [1]
Regarding your example, please, be aware that all PTransforms are supposed to be executed in distributed environment and the order of records is not guaranteed. So, limiting the whole output by fixed number of records can be challenging - you’d need to make sure that it will be processed on only one worker, that means that you’d need to shuffle all your records by the same key and probably sort the records in way that you need. Did you consider to use “org.apache.beam.sdk.transforms.Top” for that? [2] If it doesn’t work for you, could you provide more details of your use case? Then we probably can propose the more suitable solutions for that. [1] https://beam.apache.org/documentation/programming-guide/#composite-transforms [2] https://beam.apache.org/releases/javadoc/2.50.0/org/apache/beam/sdk/transforms/Top.html — Alexey > On 15 Sep 2023, at 14:22, Joey Tran <[email protected]> wrote: > > Is there a way to extend already defined PTransforms? My question is probably > better illustrated with an example. Let's say I have a PTransform that > generates a very variable number of outputs. I'd like to "wrap" that > PTransform such that if it ever creates more than say 1,000 outputs, then I > just take the first 1,000 outputs without generating the rest of the outputs. > > It'd be trivial if I have access to the DoFn, but what if the PTransform in > question doesn't expose the `DoFn`?
