Re: Ordering of element timestamp change and window function

2020-01-22 Thread Jan Lukavský
Hi Kenn, I do not agree with the last part. We are talking about definition of semantics. If GBK can be implemented on top of stateful dofn, then stateful dofn is the more generic transform. Therefore, semantics should be defined on this transform, and _derived_ (or transferred) to the less

Re: help with this error, please

2020-01-22 Thread Tomo Suzuki
Hi Vasu, (Ignore my message if Luke's advice resolves the issue already) Would you add the entire error message, if any? A NoClassDefFoundError is usually caused by a ClassNotFoundError saying a class is not found. I don't see the missing class name in your stacktrace. I'll need (1) the entire

Re: [Discuss] Beam Summit 2020 Dates & locations

2020-01-22 Thread Chad Dombrova
Hi all, Did we come to a consensus on dates and locations for the summits? Particularly interested in the North America Summit. Thanks, -chad On Tue, Nov 26, 2019 at 7:26 AM Alexey Romanenko wrote: > Probably, it would make sense to wait a bit for October (or September) > dates since the

Re: help with this error, please

2020-01-22 Thread Luke Cwik
boolean properties only allow for getYYY, isYYY and setYYY, you can't use "should". I think you should have gotten a better error message though so it's likely something else is not working for you. How are you trying to run the test? All pipeline options use a global namespace so UseGrpc will

Re: [DISCUSS] Autoformat python code with Black

2020-01-22 Thread Udi Meiri
Sorry, backing off on this due to time constraints. On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri wrote: > It sounds like there's a consensus for yapf. I volunteer to take this on > > On Wed, Jan 22, 2020, 10:31 Udi Meiri wrote: > >> +1 to autoformatting >> >> On Wed, Jan 22, 2020 at 9:57 AM Luke

Re: help with this error, please

2020-01-22 Thread Vasu Nori
sorry I didn't realize I posted a screenshot link that wasn't visible outside google.com. here is the code I was trying to add to GcsOptions.java 1. @Description("Whether to use gRPC or not, as transport mechanism.") 2. @Default.Boolean(false) 3. Boolean shouldUseGrpc(); 4.

Re: [DISCUSS] Autoformat python code with Black

2020-01-22 Thread Udi Meiri
It sounds like there's a consensus for yapf. I volunteer to take this on On Wed, Jan 22, 2020, 10:31 Udi Meiri wrote: > +1 to autoformatting > > On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik wrote: > >> +1 to autoformatters. Also the Beam Java SDK went through a one time pass >> to apply the

Re: Ordering of element timestamp change and window function

2020-01-22 Thread Kenneth Knowles
Had a lunch chat about this issue. Moving elements back in time can make them late or droppable. You just can't really do it safely. Moving elements into the future is fine up to the end of the window. It is not safe to move further. The watermark for a PCollection is based on the element

help with this error, please

2020-01-22 Thread Vasu Nori
Hello I am trying to add a new property to this file like this but this results in

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-22 Thread Udi Meiri
Thomas, please let us know if you learn more about possible root causes to the regression you're seeing. Also, if you believe this should block the release then please vote -1. Does Beam have performance tests for the Python Flink portable streaming case? On Wed, Jan 22, 2020 at 8:08 AM

Re: [DISCUSS] Autoformat python code with Black

2020-01-22 Thread Udi Meiri
+1 to autoformatting On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik wrote: > +1 to autoformatters. Also the Beam Java SDK went through a one time pass > to apply the spotless formatting. > > On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay wrote: > >> +1 to autoformatters and yapf. It appears to be a

A new reworked Elasticsearch 7+ IO module

2020-01-22 Thread Ludovic Boutros
Dear all, I have written a completely reworked Elasticsearch 7+ IO module. It can be found here: https://github.com/ludovic-boutros/beam/tree/fresh-reworked-elasticsearch-io-v7/sdks/java/io/elasticsearch7 This is a quite advance WIP work but I'm a quite new user of Apache Beam and I would like

Re: [DISCUSS] Autoformat python code with Black

2020-01-22 Thread Luke Cwik
+1 to autoformatters. Also the Beam Java SDK went through a one time pass to apply the spotless formatting. On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay wrote: > +1 to autoformatters and yapf. It appears to be a well maintained project. > I do support making a one time pass to apply formatting

Re: Updating Metrics Counter in user defined thread

2020-01-22 Thread Luke Cwik
I think any approach where we allow asynchronous processing in another thread needs a holistic approach beyond counters since many things that the thread may want to have access to (user state, side inputs, counters, producing output) was intentionally setup to not be thread safe due to the cost

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-22 Thread Jean-Baptiste Onofré
+1 (binding) Quickly tested on beam-samples. Regards JB On 22/01/2020 16:33, Ismaël Mejía wrote: > +1 (binding) > > - Validated signatures > - Run Python wordcount on Direct runner (from wheels) > - Run Python wordcount on Flink runner with job-server image (via wheels) > - Run Python

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-22 Thread Thomas Weise
When trying to upgrade our fork from 2.16 to 2.18, we see a significant performance degradation. This applies to Flink portable streaming with the Python SDK. I don't know what the cause is yet. If anyone else has done validation with a similar setup and this RC, it would be good to know the

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-22 Thread Valentyn Tymofieiev
+1. In addition to checks performed earlier, re-ran streaming quickstarts and batch mobile gaming examples on Dataflow runner after containers were updated. On Wed, Jan 22, 2020 at 7:33 AM Ismaël Mejía wrote: > +1 (binding) > > - Validated signatures > - Run Python wordcount on Direct runner

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-22 Thread Ismaël Mejía
+1 (binding) - Validated signatures - Run Python wordcount on Direct runner (from wheels) - Run Python wordcount on Flink runner with job-server image (via wheels) - Run Python wordcount on Spark runner with job-server from source (via wheels) - Validate no regressions on Nexmark for Spark

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-22 Thread Alexey Romanenko
Agree with Ahmet and Robert - IMO, this is not a blocker for 2.18. Sorry for messing things up a bit with this commit, we can revert it from 2.18 branch if it’s needed. > On 21 Jan 2020, at 22:48, Robert Bradshaw wrote: > > On Tue, Jan 21, 2020 at 12:04 PM Ahmet Altay wrote: >> >> This

Re: Ordering of element timestamp change and window function

2020-01-22 Thread Jan Lukavský
I sense this discussion might be (remotely) related to [1] (and especially [2]). The common ground here is that we need a sound definition of window. I think people might be currently having different definitions, which leads to this sort of misunderstandings. The definition should be created