Re: bug in context.timestamp() after GroupByKey transform?

2018-12-03 Thread Robert Bradshaw
After a GroupByKey, a (single) timestamp needs to be assigned to the full KV> element. By default the timestamp chosen is the end of the window, which in the case of the global window is a timestamp as far into the future as can be represented. (Python prints these as MAX_TIMESTAMP rather than an e

Re: Beam Metrics questions

2018-12-03 Thread Etienne Chauchot
Hi Phil, Thanks for the update I was checking the code and I was not understanding how the filtering could fail. Etienne Le vendredi 30 novembre 2018 à 10:53 -0600, Phil Franklin a écrit : > Etienne, I’ve just discovered that the code I used for my tests overrides the > command-line arguments, a

Re: Beam Metrics questions

2018-12-03 Thread Etienne Chauchot
Hi Phil,No, spark does not support committed metrics either. Only dataflow supports them, and only in batch mode. All the other runners except Direct Runner and Dataflow use AccumulatedMetricResult that throws an Exception if the user requests committed metrics, see for ex https://github.com/apa

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-03 Thread Wayne Collins
Hi JC, Thanks for the quick response! I had hoped for an in-pipeline solution for runner portability but it is nice to know we're not the only ones stepping outside to interact with runner management. :-) Wayne On 2018-12-03 01:23, Juan Carlos Garcia wrote: > Hi Wayne,  > > We have the same set

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-03 Thread Lukasz Cwik
There are propoosals for pipeline drain[1] and also for snapshot and update[2] for Apache Beam. We would love contributions in this space. 1: https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8 2: https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p

Re: Generic Type PTransform

2018-12-03 Thread Lukasz Cwik
Apache Beam attempts to propagate coders through by looking at any typing information available but because Java has a lot of type erasure and there are many scenarios where these coders can't be propagated forward from a previous transform. Take the following two examples (note that there are man

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-03 Thread Wayne Collins
Excellent proposals! They go beyond our requirements but would provide a great foundation for runner-agnostic life cycle management of pipelines. Will jump into discussion on the other side... Thanks! Wayne On 2018-12-03 11:53, Lukasz Cwik wrote: > There are propoosals for pipeline drain[1] and

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-03 Thread Thomas Weise
As noted, there is currently no support for Flink savepoints through the Beam API. However, it is now possible to restore from a savepoint with a Flink runner specific pipeline option: https://issues.apache.org/jira/browse/BEAM-5396 https://github.com/apache/beam/pull/7169#issuecomment-443283332

Dynamic Naming of file using KV in IO

2018-12-03 Thread Vinay Patil
Hi, I need a help regarding dynamic naming for Xml with KV PCollection. PCollection> xmlCollection =…. I am not able to use XmlIO for this PCollection XmlDTO is actually the dto marshalled and String is the key I tried using KV but XmlIO needs a Class type, KV.getClass does not work… I need to

Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-03 Thread Chandan Biswas
Hello All, I have a use case where I have PCollection> data coming from Kafka source. When processing each record (KV) I need all old values for that Key stored in a hbase table. The naive approach is to do HBase lookup in the DoFn.processElement. I considered sideinput but it' not going to work be

How to kick off a Beam pipeline with PortableRunner in Java?

2018-12-03 Thread sai . inampudi
Hi everyone, Can someone point me to how to kick off a Beam pipeline using the PortableRunner (w/Flink) in Java? I saw some examples in Python but I haven't been able to find any for Java. I tried to modify the runner option to use PortableRunner but I get the following error below: java.lang.

Re: How to kick off a Beam pipeline with PortableRunner in Java?

2018-12-03 Thread Ruoyun Huang
Maybe this helps: https://cwiki.apache.org/confluence/display/BEAM/Usage+Guide On Mon, Dec 3, 2018 at 2:10 PM sai.inamp...@gmail.com < sai.inamp...@gmail.com> wrote: > Hi everyone, > > Can someone point me to how to kick off a Beam pipeline using the > PortableRunner (w/Flink) in Java? I saw some

Re: How to kick off a Beam pipeline with PortableRunner in Java?

2018-12-03 Thread Ankur Goenka
Thanks Ruoyun! For Flink, we use a different job server which you can start using "./gradlew beam-runners-flink_2.11-job-server:runShadow " The host:port for this jobserver is localhost:8099 On Mon, Dec 3, 2018 at 2:24 PM Ruoyun Huang wrote: > Maybe this helps: > https://cwiki.apache.org/conflu

2019 Beam Events

2018-12-03 Thread Griselda Cuevas
Hi Beam Community, I started curating industry conferences, meetups and events that are relevant for Beam, this initial list I came up with . *I'd love your help adding others that I might have overlook

No Translator Found issue

2018-12-03 Thread Vinay Patil
Hi, I am using Beam 2.8.0 version. When I submit pipeline to Flink 1.5.2 cluster, I am getting the following exception: Caused by: java.lang.IllegalStateException: No translator known for org.apache.beam.sdk.io.Read$Bounded Can you please let me know what could be the problem? Regards, Vinay Pa

Re: No Translator Found issue

2018-12-03 Thread Juan Carlos Garcia
Hi Vinay, When generating your Fatjar make sure you are merging the service files (META-INF/services) of your dependencies. Apache Beam relies heavily on the Java service locator to discover / register its components. JC Am Di., 4. Dez. 2018, 03:18 hat Vinay Patil geschrieben: > Hi, > > I am

Re: Join PCollection Data with HBase Large Data - Suggestion Requested

2018-12-03 Thread Lukasz Cwik
What about a StatefulDoFn where you append the value(s) in a bag state as you see them? If you need to seed the state information, you could do a one time lookup in processElement for each key to HBase if the key hasn't yet been seen (storing the fact that you loaded the data in a boolean) but aft