Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-06 Thread Matt Casters
is pretty obscure don't you think? Great discussion :-) Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op zo 5 mei 2019 om 00:18 schreef kant kodali : > I believe this comes down to more of abstractions vs execution engines

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-04 Thread Matt Casters
back: why on Earth would you *want* to write for a specific platform? Are you *really* interested in those 0.1% use cases and is it really helping your business move forward? It's possible but if not, I would strongly advice against it. Just my 2 cents. Cheers, Matt --- Matt Casters at

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-16 Thread Matt Casters
Kettle indeed uses POI for xlsx but you can configure it in the Excel Input step. Kettle on Apache Beam would read the file(s) in a single thread as discussed earlier on the user beam mailing list. You can download a version with Beam over here: http://www.kettle.be/ Cheers, Matt --- Matt

Re: Accessing PipelineOptions in DoFn @Setup

2019-02-28 Thread Matt Casters
ettle/beam/core/fn/StringToKettleRowFn.java So grab the information earlier from the pipeline (or not use the pipeline options like that) and pass it down through a constructor. HTH, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op do 28 feb.

Re: Kettle Beam 0.5.0

2019-02-25 Thread Matt Casters
Beam? (I > don't know what a Kettle connector API looks like) > > It would be cool to make these connectors more broadly available to Beam > users, though maybe not optimal for parallel big data reads. > > Kenn > > On Sun, Feb 24, 2019 at 1:13 PM Matt Casters &g

Kettle Beam 0.5.0

2019-02-24 Thread Matt Casters
from conceptual work to something we can consider to be stable. Apache Beam has really made a huge difference. Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder

Re: [ANNOUNCE] Apache Beam 2.10.0 released!

2019-02-18 Thread Matt Casters
Hi Beam-ers, Congratulations on the release! I did a quick upgrade and tried to run my test ETL jobs on Direct & Flink runners with great success. DataFlow however is throwing all sorts of errors. For example: * Handler for GET /v1.27/images/ gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.

Re: Visual Beam - First demonstration - London

2019-02-17 Thread Matt Casters
should work out of the box :) >> >> > However, I'm first trying to solve the complicated issue of grouping >> records together in Beam in a safe way so that they can batched up >> >> I'm not sure what your use case is but Beam does batching by default. >

Re: Visual Beam - First demonstration - London

2019-02-12 Thread Matt Casters
@Matt: Thanks for the plug. It's good to hear you enjoyed it. I think > the link to your slides got messed up: http://beam.kettle.be > > Are you planning to add execution via the Flink Runner to Kettle? Saw in > the presentation that you already support Direct, Spark, and Dataflow.

Re: Visual Beam - First demonstration - London

2019-02-11 Thread Matt Casters
By the way, Maximilian, I linked and plugged your wonderful FOSDEM presentation in my slides http://beam kettle.be slide 19. If you mind, let me know and I'll get it out of the slides. In any case, great content worth promoting I thought. Op wo 6 feb. 2019 18:03 schreef Maximilian Michels Hi Dan,

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
That was it Juan Carlos. Can't thank you enough. I now have generic Kettle transformations running on the Direct Runner, Dataflow and Spark. Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 29 jan. 2019 om 18:19 schreef Matt

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
No you're right. I got so focused on getting org.apache.hadoop.fs.FileSystem in order that I forgot about the other files. Doh! Thanks for the tip! --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 29 jan. 2019 om 16:41 schreef Juan Carlo

Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
M.txt) It's the exact same file and pipeline I've been testing on local files in the direct runner and on Google Storage in DataFlow without a problem. What could I have missed in the Spark case? Thanks again for any suggestions! Matt --- Matt Casters attcast...@gmail.com> Senior

Re: Spark progress feedback

2019-01-29 Thread Matt Casters
certain possible drivers between hadoop-common-2.6.5.jar and hadoop-hdfs-2.6.5.jar depending which one got packaged first. For the maven adepts there are plugins that fix the collision. Right now this issue is gone for me. Thanks! Matt --- Matt Casters attcast...@gmail.com> Senior Solution Archit

Re: Spark progress feedback

2019-01-28 Thread Matt Casters
the machine used for launching has all the > hdfs environments variable set, as the pipeline is being configured in the > launching machine before it hit the worker machine. > > Good luck > JC > > > Am Mo., 28. Jan. 2019, 13:34 hat Matt Casters > geschrieben: > >&

Spark progress feedback

2019-01-28 Thread Matt Casters
x27;t think it's all that representative of real-world scenarios. Thanks anyway in advance for any suggestions, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder

Re: Spark

2019-01-19 Thread Matt Casters
hen, from the same GUI, launch them on a Direct runner, in DataFlow, on Spark, Flink, ... I think it will be a first in the open source data integration world and it's all possible thanks to the Apache Beam team so on behalf of myself and our community: thanks again. Cheers, Matt --- Matt

Re: Spark

2019-01-18 Thread Matt Casters
ions apply for Flink. Is this because Spark and Flink lack the APIs to talk to a client about the state of workloads unlike DataFlow and the Direct Runner? Thanks! Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op do 17 jan. 2019 om 15:30 sch

Spark

2019-01-17 Thread Matt Casters
;m either doing something wrong or I'm reading the docs wrong or the wrong docs. The thing is, if you try running your pipelines against a Spark master feedback is really minimal putting you in a trial & error situation pretty quickly. So thanks again in advance for any help! Cheers, Matt

Re: Single threaded processing

2019-01-08 Thread Matt Casters
he input side like generic JDBC / system information and so on. Probably this will bring a another 50 of so Kettle steps into the functionality pallet. Visual programming FTW! Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 8

Re: Single threaded processing

2019-01-07 Thread Matt Casters
the Beam source code to figure something out. Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op ma 7 jan. 2019 om 23:09 schreef Pablo Estrada : > Hi Matt, > is this computation running as part of a larger pipeline that does ru

Single threaded processing

2019-01-07 Thread Matt Casters
readers or writers, what is a good alternative for ParDo? Thanks in advance for any suggestions! Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder

Visual Beam Development w/ Kettle

2018-12-15 Thread Matt Casters
ou want to have it (it's not in a permanent location, no need to steer future readers in the wrong direction). Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect at Neo4j, Kettle Project Founder

Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Matt Casters
Great news! Congratulations! My experience venturing into the world of Apache Beam couldn't possibly have been nicer. Thank you to all involved! --- Matt Op vr 14 dec. 2018 om 04:42 schreef Chamikara Jayalath : > The Apache Beam team is pleased to announce the release of version 2.9.0! > > Apac

Re:

2018-12-10 Thread Matt Casters
--- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op vr 30 nov. 2018 om 12:51 schreef Matt Casters : > I just wanted to thank you again. I split up my project in a beam core > stuff and my plugin. This got rid of a number of circular dependency > is

Re: 2019 Beam Events

2018-12-05 Thread Matt Casters
London Februari 7th https://docs.google.com/presentation/d/1sXjpOOYqjnA-b8rl9tUktDJwMIW3nQ7mUVoU3q7PJmk/edit?usp=drivesdk Op wo 5 dec. 2018 22:23 schreef Griselda Cuevas Where are you going to present them Matt? > > > > On Wed, 5 Dec 2018 at 02:24, Matt Casters wrote: > >

Re: 2019 Beam Events

2018-12-05 Thread Matt Casters
It has been be a lot of fun to see Kettle transformations run on Beam. I think it will blow a lot of people's minds to actually see it in action when I'll be presenting the results in February. --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founde

Re: Generic Type PTransform

2018-12-02 Thread Matt Casters
There are probably smarter people than me on this list but since I recently been through a similar thought exercise... For the generic use in Kettle I have a PCollection going through the pipeline. KettleRow is just an Object[] wrapper for which I can implement a Coder. The "group by" that I impl

Re:

2018-11-30 Thread Matt Casters
anticipated I must admit. I'm in awe of how clean and intuitive the Beam API is (once you get the hang of it). Thanks for everything! https://github.com/mattcasters/kettle-beam-core https://github.com/mattcasters/kettle-beam Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution A

Re:

2018-11-29 Thread Matt Casters
not an expert on the details of how this works, >> but you'll probably have to make sure these filesystem dependencies >> are in your custom classloader's jar. >> >> [1] >> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/s

[no subject]

2018-11-29 Thread Matt Casters
here. Thanks a lot for any answers or tips you might have! Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder