Re: 2019 Beam Events

2018-12-04 Thread Matthias Baetens
Great stuff, Gris! Looking forward to what 2019 will bring! The Beam meetup in London will have a new get together early next year as well :-) https://www.meetup.com/London-Apache-Beam-Meetup/ On Tue, 4 Dec 2018 at 23:50 Austin Bennett wrote: > Already got that process kicked off with the NY a

Re: How to kick off a Beam pipeline with PortableRunner in Java?

2018-12-04 Thread Ruoyun Huang
This should work. but maybe try adding a section like this to your pom.xml file: portable-runner true org.apache.beam beam-runners-reference-java ${beam.version} runtime

Re: 2019 Beam Events

2018-12-04 Thread Austin Bennett
Already got that process kicked off with the NY and LA meet ups, now that SF is about to be inagurated goal will be to get these moving as well. For anyone that is in (or goes to) those areas: https://www.meetup.com/New-York-Apache-Beam/ https://www.meetup.com/Los-Angeles-Apache-Beam/ Please reac

Re: 2019 Beam Events

2018-12-04 Thread Griselda Cuevas
+1 to Pablo's suggestion, if there's interest in "Founding a Meetup group in a particular city, let's create the Meetup page and start getting sign ups. Joana will be reaching out with a comprenhexive list of how to get started and we're hoping to compile a high level calendar of launches/announcem

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Chandan Biswas
Thanks Lukasz for quick reply. On Tue, Dec 4, 2018 at 4:20 PM Lukasz Cwik wrote: > Is HBase only updated by the output within your pipeline or can an > external system also update the HBase data? If no, then querying HBase > within processElement is your best bet since your effectively trying to

Re: How to kick off a Beam pipeline with PortableRunner in Java?

2018-12-04 Thread Sai Inampudi
Thanks for the help Ankur and Ruoyun, appreciate it. I went through the wiki and I am still facing the same issue as before (where it complains about the following: java.lang.IllegalArgumentException: Unknown 'runner' specified 'PortableRunner', supported pipeline runners [DirectRunner, FlinkRun

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Lukasz Cwik
Is HBase only updated by the output within your pipeline or can an external system also update the HBase data? If no, then querying HBase within processElement is your best bet since your effectively trying to do a sparse lookup with slowly changing data. On Tue, Dec 4, 2018 at 11:59 AM Chandan

Re: 2019 Beam Events

2018-12-04 Thread Daniel Salerno
=) What good news! Okay, I'll set up the group and try to get interested. Thank you Em ter, 4 de dez de 2018 às 17:19, Pablo Estrada escreveu: > FWIW, for some of these places that have interest (e.g. Brazil, Israel), > it's possible to create a group in meetup.com, and start gauging > interest

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Chandan Biswas
Also I forgot to mention that keys will not be repeating frequently in a window. Thanks, Chandan On Tue, Dec 4, 2018 at 1:49 PM Chandan Biswas wrote: > Thanks Lukasz and Steve for replying quickly. Sorry for not be clear > enough. But my use case is something like Steve mentioned. So when I am

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Chandan Biswas
Thanks Lukasz and Steve for replying quickly. Sorry for not be clear enough. But my use case is something like Steve mentioned. So when I am reading the data from stream, I need to figure out that the data is coming from stream is not duplicate for the key. So I need to check the all the historical

Re: Dynamic Naming of file using KV in IO

2018-12-04 Thread Chamikara Jayalath
Not sure if I fully understood your question, but will it be possible to send the PCollection that you read from XmlIO through a ParDo to generate the PCollection> that you need ? If this doesn't work, you can also try reading XML files directly from a ParDo and generating the PCollection that you

Re: 2019 Beam Events

2018-12-04 Thread Pablo Estrada
FWIW, for some of these places that have interest (e.g. Brazil, Israel), it's possible to create a group in meetup.com, and start gauging interest, and looking for organizers. Once a group of people with interest exists, it's easier to get interest / sponsorship to bring speakers. So if you are wil

Re: 2019 Beam Events

2018-12-04 Thread Daniel Salerno
It's a shame that there are no events in Brazil ... =( Em ter, 4 de dez de 2018 às 13:12, OrielResearch Eila Arich-Landkof < e...@orielresearch.org> escreveu: > agree 👍 > > On Tue, Dec 4, 2018 at 5:41 AM Chaim Turkel wrote: > >> Israel would be nice to have one >> chaim >> On Tue, Dec 4, 2018 a

Beam Metrics using FlinkRunner

2018-12-04 Thread Phil Franklin
I’m having difficulty accessing Beam metrics when using FlinkRunner in streaming mode. I don’t get any metrics from MetricsPusher, though the same setup delivered metrics from SparkRunner. Probably for the same reason that MetricsPusher doesn’t work, I also don’t get any output when I call an i

Re: bean elasticsearch connector for dataflow

2018-12-04 Thread Adeel Ahmad
Hello, Thanks - Just seen this , updated support for ES 6.3.2... https://beam.apache.org/blog/2018/10/29/beam-2.8.0.html Adeel On Tue, 4 Dec 2018 at 17:43, Tim wrote: > Beam 2.8.0 brought in support for ES 6.3.x > > I’m not sure if that works against a 6.5.x server but I could imagine it > d

Re: bean elasticsearch connector for dataflow

2018-12-04 Thread Tim
Beam 2.8.0 brought in support for ES 6.3.x I’m not sure if that works against a 6.5.x server but I could imagine it does. Tim, Sent from my iPhone > On 4 Dec 2018, at 18:28, Adeel Ahmad wrote: > > Hello, > > I am trying to use gcp dataflow for indexing data from pubsub into > elasticsearch.

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Steve Niemitz
interesting to know that the state scales so well! On Tue, Dec 4, 2018 at 12:21 PM Lukasz Cwik wrote: > Your correct in saying that StatefulDoFn is pointless if you only see > every key+window once. The users description wasn't exactly clear but it > seemed to me they were reading from a stream

bean elasticsearch connector for dataflow

2018-12-04 Thread Adeel Ahmad
Hello, I am trying to use gcp dataflow for indexing data from pubsub into elasticsearch. Does dataflow (which uses beam) now support elasticsearch 6.5.x or does it still only support 5.6.x? -- Thanks, Adeel

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Lukasz Cwik
Your correct in saying that StatefulDoFn is pointless if you only see every key+window once. The users description wasn't exactly clear but it seemed to me they were reading from a stream and wanted to store all old values that they had seen implying they see keys more then once. The user would nee

Re: Generic Type PTransform

2018-12-04 Thread Lukasz Cwik
There are various strategies that you can use depending on what you know with the worst case being that you have to ask the person using the PTransform to give you a K and V coder or a concrete type descriptor for K and V which would allow you to get the coder from the coder registry. The Apache B

Re: No Translator Found issue

2018-12-04 Thread Vinay Patil
Thank you Juan, adding the following worked for me: transformer implementation= "org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" Regards, Vinay Patil On Mon, Dec 3, 2018 at 11:21 PM Juan Carlos Garcia wrote: > Hi Vinay, > > When generating your Fatjar make sure you are mer

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-04 Thread Steve Niemitz
We have a similar use case, except with BigtableIO instead of HBase. We ended up building a custom transform that was basically PCollection[ByteString] -> PCollection[BigtableRow], and would fetch rows from Bigtable based on the input, however it's tricky to get right because of batching, etc. I'

Re: 2019 Beam Events

2018-12-04 Thread OrielResearch Eila Arich-Landkof
agree 👍 On Tue, Dec 4, 2018 at 5:41 AM Chaim Turkel wrote: > Israel would be nice to have one > chaim > On Tue, Dec 4, 2018 at 12:33 AM Griselda Cuevas wrote: > > > > Hi Beam Community, > > > > I started curating industry conferences, meetups and events that are > relevant for Beam, this initia

Re: Latin America Community

2018-12-04 Thread Eryx
Hi Leonardo, I'm Héctor Eryx from Guadalajara, México. I'm currently using Beam for personal projects, plus giving some training/mentoring on how to use to local communities. Also, I'm in touch with some friends at IBM Mexico who are using Beam to run data storage events analysis. We are few, bu

Re: 2019 Beam Events

2018-12-04 Thread Dan
The next Pentaho London meetup has a presentation on using Kettle with Beam: https://www.meetup.com/Pentaho-London-User-Group/events/256773962/ Thanks, Dan On Tue, 4 Dec 2018 at 13:20, Maximilian Michels wrote: > Thanks for sharing, Gris! This list will likely never be complete, as > there are

Re: 2019 Beam Events

2018-12-04 Thread Maximilian Michels
Thanks for sharing, Gris! This list will likely never be complete, as there are endless conferences :) Nevertheless, it's a great idea to coordinate the attendance for the major ones. Cheers, Max On 03.12.18 23:33, Griselda Cuevas wrote: Hi Beam Community, I started curating industry confe

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-04 Thread Maximilian Michels
Thank you for sharing these, Lukasz! Great question, Wayne! As for pipeline shutdown, Flink users typically take a snapshot and then cancel the pipeline with Flink tools. The Beam tooling needs to be improved to support cancelling as well. If snapshotting is enabled, the Beam job could also

Latin America Community

2018-12-04 Thread Leonardo Miguel
Hi guys, Just want to check if there is someone using Beam and/or Scio at this side of the globe. I'd like to know also if there is any event near or some related community. If you are using Beam and/or Scio, please let me know. Let me start first: I'm located at Sao Carlos, Sao Paulo, Brazil. We

RE: Generic Type PTransform

2018-12-04 Thread Eran Twili
Thanks Lukasz, Please tell me, how can I set a coder on the PCollection created after the "MapToKV" apply? I mean, all I know is that it will be a PCollection>, and I don't know what will be the actual runtime types of K and V. So, what coder should I set? Can you please give a code example of h