is pretty obscure don't you think?
Great discussion :-)
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op zo 5 mei 2019 om 00:18 schreef kant kodali :
> I believe this comes down to more of abstractions vs execution engines
back: why on Earth would
you *want* to write for a specific platform? Are you *really* interested
in those 0.1% use cases and is it really helping your business move
forward? It's possible but if not, I would strongly advice against it.
Just my 2 cents.
Cheers,
Matt
---
Matt Casters at
Kettle indeed uses POI for xlsx but you can configure it in the Excel Input
step. Kettle on Apache Beam would read the file(s) in a single thread as
discussed earlier on the user beam mailing list.
You can download a version with Beam over here: http://www.kettle.be/
Cheers,
Matt
---
Matt
ettle/beam/core/fn/StringToKettleRowFn.java
So grab the information earlier from the pipeline (or not use the pipeline
options like that) and pass it down through a constructor.
HTH,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op do 28 feb.
Beam? (I
> don't know what a Kettle connector API looks like)
>
> It would be cool to make these connectors more broadly available to Beam
> users, though maybe not optimal for parallel big data reads.
>
> Kenn
>
> On Sun, Feb 24, 2019 at 1:13 PM Matt Casters
&g
from conceptual work to something we can
consider to be stable. Apache Beam has really made a huge difference.
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Hi Beam-ers,
Congratulations on the release!
I did a quick upgrade and tried to run my test ETL jobs on Direct & Flink
runners with great success.
DataFlow however is throwing all sorts of errors. For example:
* Handler for GET /v1.27/images/
gcr.io/cloud-dataflow/v1beta3/beam-java-batch:beam-2.
should work out of the box :)
>>
>> > However, I'm first trying to solve the complicated issue of grouping
>> records together in Beam in a safe way so that they can batched up
>>
>> I'm not sure what your use case is but Beam does batching by default.
>
@Matt: Thanks for the plug. It's good to hear you enjoyed it. I think
> the link to your slides got messed up: http://beam.kettle.be
>
> Are you planning to add execution via the Flink Runner to Kettle? Saw in
> the presentation that you already support Direct, Spark, and Dataflow.
By the way, Maximilian, I linked and plugged your wonderful FOSDEM
presentation in my slides http://beam kettle.be slide 19. If you mind, let
me know and I'll get it out of the slides. In any case, great content worth
promoting I thought.
Op wo 6 feb. 2019 18:03 schreef Maximilian Michels Hi Dan,
That was it Juan Carlos. Can't thank you enough.
I now have generic Kettle transformations running on the Direct Runner,
Dataflow and Spark.
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op di 29 jan. 2019 om 18:19 schreef Matt
No you're right. I got so focused on getting
org.apache.hadoop.fs.FileSystem in order that I forgot about the other
files. Doh!
Thanks for the tip!
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op di 29 jan. 2019 om 16:41 schreef Juan Carlo
M.txt)
It's the exact same file and pipeline I've been testing on local files in
the direct runner and on Google Storage in DataFlow without a problem.
What could I have missed in the Spark case?
Thanks again for any suggestions!
Matt
---
Matt Casters attcast...@gmail.com>
Senior
certain possible drivers between
hadoop-common-2.6.5.jar and hadoop-hdfs-2.6.5.jar depending which one got
packaged first.
For the maven adepts there are plugins that fix the collision.
Right now this issue is gone for me.
Thanks!
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Archit
the machine used for launching has all the
> hdfs environments variable set, as the pipeline is being configured in the
> launching machine before it hit the worker machine.
>
> Good luck
> JC
>
>
> Am Mo., 28. Jan. 2019, 13:34 hat Matt Casters
> geschrieben:
>
>&
x27;t think it's all that
representative of real-world scenarios.
Thanks anyway in advance for any suggestions,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
hen,
from the same GUI, launch them on a Direct runner, in DataFlow, on Spark,
Flink, ... I think it will be a first in the open source data integration
world and it's all possible thanks to the Apache Beam team so on behalf of
myself and our community: thanks again.
Cheers,
Matt
---
Matt
ions apply for Flink. Is this
because Spark and Flink lack the APIs to talk to a client about the state
of workloads unlike DataFlow and the Direct Runner?
Thanks!
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op do 17 jan. 2019 om 15:30 sch
;m either doing something wrong or I'm reading the docs wrong
or the wrong docs.
The thing is, if you try running your pipelines against a Spark master
feedback is really minimal putting you in a trial & error situation pretty
quickly.
So thanks again in advance for any help!
Cheers,
Matt
he input side like generic JDBC / system
information and so on.
Probably this will bring a another 50 of so Kettle steps into the
functionality pallet. Visual programming FTW!
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op di 8
the Beam source code to figure something out.
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op ma 7 jan. 2019 om 23:09 schreef Pablo Estrada :
> Hi Matt,
> is this computation running as part of a larger pipeline that does ru
readers or writers,
what is a good alternative for ParDo?
Thanks in advance for any suggestions!
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
ou want to have it (it's not in a permanent
location, no need to steer future readers in the wrong direction).
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect at Neo4j, Kettle Project Founder
Great news! Congratulations!
My experience venturing into the world of Apache Beam couldn't possibly
have been nicer. Thank you to all involved!
---
Matt
Op vr 14 dec. 2018 om 04:42 schreef Chamikara Jayalath :
> The Apache Beam team is pleased to announce the release of version 2.9.0!
>
> Apac
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op vr 30 nov. 2018 om 12:51 schreef Matt Casters :
> I just wanted to thank you again. I split up my project in a beam core
> stuff and my plugin. This got rid of a number of circular dependency
> is
London Februari 7th
https://docs.google.com/presentation/d/1sXjpOOYqjnA-b8rl9tUktDJwMIW3nQ7mUVoU3q7PJmk/edit?usp=drivesdk
Op wo 5 dec. 2018 22:23 schreef Griselda Cuevas Where are you going to present them Matt?
>
>
>
> On Wed, 5 Dec 2018 at 02:24, Matt Casters wrote:
>
>
It has been be a lot of fun to see Kettle transformations run on Beam. I
think it will blow a lot of people's minds to actually see it in action
when I'll be presenting the results in February.
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founde
There are probably smarter people than me on this list but since I recently
been through a similar thought exercise...
For the generic use in Kettle I have a PCollection going through
the pipeline.
KettleRow is just an Object[] wrapper for which I can implement a Coder.
The "group by" that I impl
anticipated I must admit.
I'm in awe of how clean and intuitive the Beam API is (once you get the
hang of it).
Thanks for everything!
https://github.com/mattcasters/kettle-beam-core
https://github.com/mattcasters/kettle-beam
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution A
not an expert on the details of how this works,
>> but you'll probably have to make sure these filesystem dependencies
>> are in your custom classloader's jar.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/s
here.
Thanks a lot for any answers or tips you might have!
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
31 matches
Mail list logo