Thanks a lot for the tip and for looking into the code. A lot of cleanup
certainly needs to happen ;-)
The original tip for using @FinishBundle came from Maximilian and it does
indeed work like a charm.

For some reason I couldn't find the annotations for the DoFn main methods
but I really should have looked into DoFn.java a bit earlier. This whole
Kettle Beam thing started out as a bit of a side project. I didn't know it
was going to work so quickly.

But now I'm happy how things are going.  On the whole it's really awesome
to be able to drag and drop a Kettle transformation together, unit test it,
run it against direct runner to see all is well and then run it against
bigger data sets in DataFlow or Spark. It really speeds up development in
the sense that it's quite easy test a lot of different scenarios.

Anyway, this has indeed been a lot of fun, thanks for allowing it with the
Apache Beam project!

Cheers,
Matt

Op zo 17 feb. 2019 06:43 schreef Kenneth Knowles <k...@apache.org:

> I think my favorite line is "Haven’t had this much fun since I started
> Kettle" :-)
>
> I was browsing https://github.com/mattcasters/kettle-beam/ to see if I
> could comment on how to take advantage of bundling (where to move expensive
> logic to @FinishBundle - you will notice that Beam's IO connectors use this)
>
> I noticed that the core DoFns are defined in org.kettle.beam.core which is
> in the separate https://github.com/mattcasters/kettle-beam-core/
> repository. JFYI for the sake of users / code lurkers.
>
> The only place that looked like it did the sort of work where bundling
> would matter is the StepTransform. There's already a separate @FinishBundle
> there - does more of the logic need to be moved there?
>
> Kenn
>
> On Tue, Feb 12, 2019 at 8:01 AM Maximilian Michels <m...@apache.org> wrote:
>
>> Yes, you can use Flink's local execution mode, which is the default if
>> you don't provide any settings. A cluster should not be necessary to
>> complete the integration. Ideally, it should work out of the box :)
>>
>> > However, I'm first trying to solve the complicated issue of grouping
>> records together in Beam in a safe way so that they can batched up
>>
>> I'm not sure what your use case is but Beam does batching by default.
>> The batches are called bundles. The Flink Runner supports setting the
>> bundle size.
>>
>> Cheers,
>> Max
>>
>> On 12.02.19 12:20, Matt Casters wrote:
>> > Yes, Flink is obviously the next target.  I'm not expecting too many
>> > issues there beyond getting a cluster set up to test on.  I read you
>> can
>> > run the Flink Runner locally so that will help a lot in testing.
>> >
>> > However, I'm first trying to solve the complicated issue of grouping
>> > records together in Beam in a safe way so that they can batched up.
>> > Batching up is really important for fast loading into a lot of output
>> > targets.  I'll probably use some group by behind the scenes or
>> something
>> > like that, need to think about that.
>> > Having the ability to re-use the existing Kettle steps without having
>> to
>> > write new code is really key.
>> >
>> > Once that is done (in a few weeks) I'll give Flink a shot.
>> >
>> > Cheers,
>> >
>> > Matt
>> >
>> > Op di 12 feb. 2019 om 12:02 schreef Maximilian Michels <m...@apache.org
>> > <mailto:m...@apache.org>>:
>> >
>> >     @Dan: Thanks for sharing the presentation. Kettle is a great way to
>> >     make
>> >     Beam more accessible.
>> >
>> >     @Matt: Thanks for the plug. It's good to hear you enjoyed it. I
>> think
>> >     the link to your slides got messed up: http://beam.kettle.be
>> >
>> >     Are you planning to add execution via the Flink Runner to Kettle?
>> >     Saw in
>> >     the presentation that you already support Direct, Spark, and
>> Dataflow.
>> >
>> >     On 11.02.19 20:50, Matt Casters wrote:
>> >      > By the way, Maximilian, I linked and plugged your wonderful
>> FOSDEM
>> >      > presentation in my slides http://beam kettle.be
>> >     <http://kettle.be> <http://kettle.be> slide
>> >      > 19. If you mind, let me know and I'll get it out of the slides.
>> >     In any
>> >      > case, great content worth promoting I thought.
>> >      >
>> >      > Op wo 6 feb. 2019 18:03 schreef Maximilian Michels
>> >     <m...@apache.org <mailto:m...@apache.org>
>> >      > <mailto:m...@apache.org <mailto:m...@apache.org>>:
>> >      >
>> >      >     Hi Dan,
>> >      >
>> >      >     Thanks for the info. Would be great to share a video of the
>> >      >     presentation.
>> >      >
>> >      >     Cheers,
>> >      >     Max
>> >      >
>> >      >     On 30.01.19 10:00, Dan wrote:
>> >      >      > Hi, in just over a week you're all welcome to come and see
>> >     the very
>> >      >      > first public reveal of Kettle running on beam! (Including
>> >     spark,
>> >      >      > dataflow and flink support)
>> >      >      >
>> >      >      >
>> >     https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
>> >      >      >
>> >      >      > So this ingenious integration combines the power of visual
>> >      >     development,
>> >      >      > with the platform agnostic benefits of beam - impressive
>> >     stuff. No
>> >      >      > vendor lock-in here!
>> >      >      >
>> >      >      >
>> >      >      > See you there!
>> >      >      > Dan
>> >      >
>> >
>>
>

Reply via email to