Hi all, I'm trying to process a few hundred Avro files on GCS. They are getting decoded and two simple filters are being applied. When running this on Beam-Direct, all heap space is getting filled within a minute or two. I threw 58 GB at it before giving up.
To limit the number of files getting processed at once, I have moved the actual processing into a pipeline executor. Alas, when running on Beam-Direct, it looks like the transforms are only initialised but do not get executed. This concerns Write to Log, JavaScript, HTTP Client and BigQuery Output. Everything behaves as expected when I configure the pipeline executor to use the Local runner. So, two questions: Is the pipeline executor transform incompatible with Beam? And, are there other approaches for limiting memory use in such a case? cheers Fabian
