Hi Hans, For now I'm stuck on Beam-Direct, as HOP-4193 <https://issues.apache.org/jira/browse/HOP-4193> is keeping me from using Dataflow. The amount of data involved is reasonably small, so this works ok for now.
As reported, the pipeline executor does not work on Beam. But I've found a workaround for now by executing the "embedded" pipeline via the local runner and writing the results to GCS, then picking them up in a later pipeline to get inserted in BigQuery. cheers Fabian > Am 10.10.2022 um 18:35 schrieb Hans Van Akelyen <[email protected]>: > > Hi Fabian, > > Could you provide a bit more information? in the past couple of weeks, some > major changes have been made to improve the performance. > Are you using a Hop local engine configuration when executing the pipeline > executor or trying the Beam-Direct? If it is the second I fear that's not > really supported currently, or definitely untested. > > That being said, Beam Direct is an engine type mainly for testing > implementation not made for actual heavy lifting. I would test implementation > with a couple of files and do the actual heavy processing using Dataflow, > Spark, or Flink. > > In one of our next releases, we are planning to add an "Advisor" which will > warn on transforms we have not yet tested. Or that we know will not always > give the expected results. > > Cheers, > Hans > > On Mon, 10 Oct 2022 at 10:28, Fabian Peters <[email protected] > <mailto:[email protected]>> wrote: > Hi all, > > I'm trying to process a few hundred Avro files on GCS. They are getting > decoded and two simple filters are being applied. When running this on > Beam-Direct, all heap space is getting filled within a minute or two. I threw > 58 GB at it before giving up. > > To limit the number of files getting processed at once, I have moved the > actual processing into a pipeline executor. Alas, when running on > Beam-Direct, it looks like the transforms are only initialised but do not get > executed. This concerns Write to Log, JavaScript, HTTP Client and BigQuery > Output. Everything behaves as expected when I configure the pipeline executor > to use the Local runner. > > So, two questions: Is the pipeline executor transform incompatible with Beam? > And, are there other approaches for limiting memory use in such a case? > > cheers > > Fabian
