Good morning Matt, Thanks for your quick reply! Unfortunately the inputs are not sorted, so the Merge Join transform is not an option. I guess I'll have to use temporary BigQuery tables to handle this. Those pipelines are all bounded, so this is an option. Or is there an easy option to sort things when running on Beam?
I'll create a Jira ticket, no problem. cheers Fabian > Am 01.09.2022 um 19:11 schrieb Matt Casters <[email protected]>: > > Hi Fabian, > > Joining rows is indeed the exception in Beam. I would suggest you use the > Merge Join > <https://hop.apache.org/manual/latest/pipeline/transforms/mergejoin.html> > transforms. > For unbounded pipelines (never ending) that transform will be handled > <https://github.com/apache/hop/blob/master/plugins/engines/beam/src/main/java/org/apache/hop/beam/pipeline/handler/BeamMergeJoinTransformHandler.java> > correctly. > If you don't mind, please create a JIRA case so we can create a similar > handler for the Cartesian product use-case. > The code usually is non-trivial in the massive parallel world but quite > doable ;-) > > All the best, > Matt > > > On Thu, Sep 1, 2022 at 6:37 PM Fabian Peters <[email protected] > <mailto:[email protected]>> wrote: > Hi all, > > I've hit the next problem, this time something I thought I had testet on Beam > before: A pipeline containing a "Join rows (cartesian product)" transform > with input from two sources, loops forever when run via Beam-Direct or > Dataflow. It works fine using the local runner. > > While running it on Beam-Direct I've attached a debugger and can see that it > is stuck in the while loop at JoinRows.java:486 > <https://github.com/apache/hop/blob/758c07c360c26c0447251f0a29df81557864ad11/plugins/transforms/joinrows/src/main/java/org/apache/hop/pipeline/transforms/joinrows/JoinRows.java#L487>. > I've tried using a GCS temp directory and swapped the "Main transform to > read from" but none of those helped. > > Is this transform incompatible with Beam? If so, what could I use instead? > > cheers > > Fabian > > <PastedGraphic-8.png> > > > -- > Neo4j Chief Solutions Architect > ✉ [email protected] <mailto:[email protected]> > > >
