I think my favorite line is "Haven’t had this much fun since I started Kettle" :-)
I was browsing https://github.com/mattcasters/kettle-beam/ to see if I could comment on how to take advantage of bundling (where to move expensive logic to @FinishBundle - you will notice that Beam's IO connectors use this) I noticed that the core DoFns are defined in org.kettle.beam.core which is in the separate https://github.com/mattcasters/kettle-beam-core/ repository. JFYI for the sake of users / code lurkers. The only place that looked like it did the sort of work where bundling would matter is the StepTransform. There's already a separate @FinishBundle there - does more of the logic need to be moved there? Kenn On Tue, Feb 12, 2019 at 8:01 AM Maximilian Michels <m...@apache.org> wrote: > Yes, you can use Flink's local execution mode, which is the default if > you don't provide any settings. A cluster should not be necessary to > complete the integration. Ideally, it should work out of the box :) > > > However, I'm first trying to solve the complicated issue of grouping > records together in Beam in a safe way so that they can batched up > > I'm not sure what your use case is but Beam does batching by default. > The batches are called bundles. The Flink Runner supports setting the > bundle size. > > Cheers, > Max > > On 12.02.19 12:20, Matt Casters wrote: > > Yes, Flink is obviously the next target. I'm not expecting too many > > issues there beyond getting a cluster set up to test on. I read you can > > run the Flink Runner locally so that will help a lot in testing. > > > > However, I'm first trying to solve the complicated issue of grouping > > records together in Beam in a safe way so that they can batched up. > > Batching up is really important for fast loading into a lot of output > > targets. I'll probably use some group by behind the scenes or something > > like that, need to think about that. > > Having the ability to re-use the existing Kettle steps without having to > > write new code is really key. > > > > Once that is done (in a few weeks) I'll give Flink a shot. > > > > Cheers, > > > > Matt > > > > Op di 12 feb. 2019 om 12:02 schreef Maximilian Michels <m...@apache.org > > <mailto:m...@apache.org>>: > > > > @Dan: Thanks for sharing the presentation. Kettle is a great way to > > make > > Beam more accessible. > > > > @Matt: Thanks for the plug. It's good to hear you enjoyed it. I think > > the link to your slides got messed up: http://beam.kettle.be > > > > Are you planning to add execution via the Flink Runner to Kettle? > > Saw in > > the presentation that you already support Direct, Spark, and > Dataflow. > > > > On 11.02.19 20:50, Matt Casters wrote: > > > By the way, Maximilian, I linked and plugged your wonderful FOSDEM > > > presentation in my slides http://beam kettle.be > > <http://kettle.be> <http://kettle.be> slide > > > 19. If you mind, let me know and I'll get it out of the slides. > > In any > > > case, great content worth promoting I thought. > > > > > > Op wo 6 feb. 2019 18:03 schreef Maximilian Michels > > <m...@apache.org <mailto:m...@apache.org> > > > <mailto:m...@apache.org <mailto:m...@apache.org>>: > > > > > > Hi Dan, > > > > > > Thanks for the info. Would be great to share a video of the > > > presentation. > > > > > > Cheers, > > > Max > > > > > > On 30.01.19 10:00, Dan wrote: > > > > Hi, in just over a week you're all welcome to come and see > > the very > > > > first public reveal of Kettle running on beam! (Including > > spark, > > > > dataflow and flink support) > > > > > > > > > > https://www.meetup.com/Pentaho-London-User-Group/events/256773962/ > > > > > > > > So this ingenious integration combines the power of visual > > > development, > > > > with the platform agnostic benefits of beam - impressive > > stuff. No > > > > vendor lock-in here! > > > > > > > > > > > > See you there! > > > > Dan > > > > > >