Re: Looping timer blog

2019-06-26 Thread Reza Rokni
On Tue, 25 Jun 2019 at 21:20, Jan Lukavský wrote: > > On 6/25/19 1:43 PM, Reza Rokni wrote: > > > > On Tue, 25 Jun 2019 at 18:12, Jan Lukavský wrote: > >> > The TTL check would be in the same Timer rather than a separate Timer. >> The max value processed in each OnTimer call would be stored in

Re: Return types of Write transforms (aka best way to signal)

2019-06-26 Thread Reza Rokni
The use case of a transform waiting for a SInk or Sinks to complete is very interesting indeed! Curious, if a sink internally makes use of a Global Window with processing time triggers to push its writes, what mechanism could be used to release a transform waiting for a signal from the Sink(s)

Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Robert Bradshaw
On Thu, Jun 27, 2019 at 1:52 AM Rui Wang wrote: >> >> >> AFAIK all streaming runners today practically do provide these panes in >> order; > > Does it refer to "the stage immediately after GBK itself processes fired > panes in order" in streaming runners? Could you share more information? > >

Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Rui Wang
> > > AFAIK all streaming runners today practically do provide these panes in > order; > Does it refer to "the stage immediately after GBK itself processes fired panes in order" in streaming runners? Could you share more information? > this means that likely many users implicitly rely on this

Re: Spotless exclusions

2019-06-26 Thread Lukasz Cwik
On Wed, Jun 26, 2019 at 4:22 PM Anton Kedin wrote: > Currently our spotless is configured globally [1] (for java at least) to > include all source files by '**/*.java'. And then we exclude things > explicitly. Don't know why, but these exclusions are ignored for me > sometimes, for example

Spotless exclusions

2019-06-26 Thread Anton Kedin
Currently our spotless is configured globally [1] (for java at least) to include all source files by '**/*.java'. And then we exclude things explicitly. Don't know why, but these exclusions are ignored for me sometimes, for example `./gradlew :sdks:java:core:spotlessJavaCheck` always fails when

Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Reuven Lax
Correct, however I think our triggering model is close to useless (or at least close to unusable) without such a guarantee, for both accumulating and discarding. What's worse - AFAIK all streaming runners today practically do provide these panes in order; this means that likely many users

Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Rui Wang
Thanks! That thread was really helpful! -Rui On Wed, Jun 26, 2019 at 1:18 PM Steve Niemitz wrote: > There was a thread about this a few months ago as well: > > https://lists.apache.org/thread.html/20d11046d26174969ef44a781e409a1cb9f7c736e605fa40fdf98397@%3Cuser.beam.apache.org%3E > > > On Wed,

Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Steve Niemitz
There was a thread about this a few months ago as well: https://lists.apache.org/thread.html/20d11046d26174969ef44a781e409a1cb9f7c736e605fa40fdf98397@%3Cuser.beam.apache.org%3E On Wed, Jun 26, 2019 at 4:02 PM Robert Bradshaw wrote: > There is no promise that panes will arrive in order

Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Robert Bradshaw
There is no promise that panes will arrive in order (especially the further you get "downstream"). Though they may be approximately so, it's dangerous to assume that. You can inspect the sequential index in PaneInfo to determine whether a pane is older than other panes you have seen. On Wed, Jun

Re: Return types of Write transforms (aka best way to signal)

2019-06-26 Thread Robert Bradshaw
Regarding Python, yes and no. Python doesn't distinguish at compile time between (1), (2), and (6), but that doesn't mean it isn't part of the public API and people might start counting on it, so it's in some sense worse. We can also do (3) (which is less cumbersome in Python, either returning a

Re: python integration tests flake detection

2019-06-26 Thread Udi Meiri
In lieu of doing a migration to pytest, which is a large effort, I'm trying to do the same using nose. Opened https://issues.apache.org/jira/browse/BEAM-7641 On Tue, Jun 25, 2019 at 4:01 PM Udi Meiri wrote: > I was thinking that our test infrastructure could use an upgrade to pytest. > > Some

Re: Using Grafana to display test metrics and alert anomalies

2019-06-26 Thread Mikhail Gryzykhin
Hi Łukasz, See answers inline. Regard, Mikhail. On Wed, Jun 26, 2019 at 7:47 AM Łukasz Gajowy wrote: > Hi Mikhail! > > Together with Kamil we're investigating the possibilities of creating > alerts for anomalies for the metrics collected from various tests (load, IO > tests, other performance

Re: Return types of Write transforms (aka best way to signal)

2019-06-26 Thread Chamikara Jayalath
BTW regarding Python SDK, I think the answer to this question is simpler for Python SDK due to the lack of types. Most examples I know just return a PCollection from the Write transform which may or may not be ignored by users. If the PCollection is used, the user should be aware of the element

Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Rui Wang
Hi Community, I am trying to understand Beam model and having a question related to accumulating mode and panes: Accumulating mode means every time when a trigger fires, it emits all values seen so far in a window(so it's called accumulating), an example from Beam programming model guide[1] sets

Re: Return types of Write transforms (aka best way to signal)

2019-06-26 Thread Chamikara Jayalath
On Wed, Jun 26, 2019 at 5:46 AM Robert Bradshaw wrote: > Good question. > > I'm not sure what could be done with (5) if it contains no deferred > objects (e.g there's nothing to wait on). > > There is also (6) return PCollection. The > advantage of (2) is that one can migrate to (1) or (6)

Using Grafana to display test metrics and alert anomalies

2019-06-26 Thread Łukasz Gajowy
Hi Mikhail! Together with Kamil we're investigating the possibilities of creating alerts for anomalies for the metrics collected from various tests (load, IO tests, other performance tests). This is unfortunately impossible to do in Perfkit explorer tool that we're using for displaying the

Re: Plan for dropping python 2 support

2019-06-26 Thread Robert Bradshaw
On Sat, Jun 22, 2019 at 1:09 AM Valentyn Tymofieiev wrote: > > On Tue, Jun 18, 2019 at 2:01 PM Ahmet Altay wrote: >> >> Thank you for the update, very helpful. It might be worthwhile to share a >> version of this with user mailing list after 2.14. > > > I think so too, we can send an update to

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Reuven Lax
Earlier than the input watermark only applies to event time timers, but the above problem holds for processing time timers as well. On Wed, Jun 26, 2019, 1:50 PM Robert Bradshaw wrote: > Yeah, it wouldn't be optimal performance-wise, but I think it's good > to keep the bar for a correct SDK

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Jan Lukavský
That sounds very interesting. I will update the JIRA with link to this discussion and then I will have a look if this can be easily implemented in the DirectRunner. Thanks for this discussion! On 6/26/19 2:50 PM, Robert Bradshaw wrote: Yeah, it wouldn't be optimal performance-wise, but I

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Robert Bradshaw
Yeah, it wouldn't be optimal performance-wise, but I think it's good to keep the bar for a correct SDK low. Might still be better than sending one timer per bundle, and you only pay the performance if timers are set earlier than the input watermark (and there was a timer firing in this range).

Re: Return types of Write transforms (aka best way to signal)

2019-06-26 Thread Robert Bradshaw
Good question. I'm not sure what could be done with (5) if it contains no deferred objects (e.g there's nothing to wait on). There is also (6) return PCollection. The advantage of (2) is that one can migrate to (1) or (6) without changing the public API, while giving something to wait on without

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Reuven Lax
This would have a lot of performance problems (especially since there is user code that caches within a bundle, and invalidates the cache at the end of every bundle). However this would be a valid "lazy" implementation. On Wed, Jun 26, 2019 at 2:29 PM Robert Bradshaw wrote: > Note also that a

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Robert Bradshaw
Note also that a "lazy" SDK implementation would be to simply return all the timers (as if they were new timers) to runner once a timer set (before or at the last requested timer in the bundle) is encountered. E.g. Suppose we had timers T1, T3, T5 in the bundle. On firing T1, we set T2 and delete

Return types of Write transforms (aka best way to signal)

2019-06-26 Thread Ismaël Mejía
Beam introduced in version 2.4.0 the Wait transform to delay processing of each window in a PCollection until signaled. This opened new interesting patterns for example writing to a database and when ‘fully’ done write to another database. To support this pattern an IO connector Write transform

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Reuven Lax
I like this option the best. It might be trickier to implement, but seems like it would be the most consistent solution. Another problem it would solve is the following: let's say a bundle arrives containing timers T1 and T2, and while processing T1 the user code deletes T2 (or resets it to a

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Jan Lukavský
Don't get me wrong, I think that this would be the best option, I just discarded it at the beginning. But maybe it can be done, but I'm not able to tell. At least what I saw in DirectRunner, I think that it works so that first timers are extracted from TimerInternals and then executed,

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Robert Bradshaw
Bundles would still be immutable pieces of work. E.g. in this case, T2 would never be sent to the runner. On Wed, Jun 26, 2019 at 1:02 PM Jan Lukavský wrote: > > I think that this approach breaks the assumption that bundles are > executed as immutable pieces of work. This way, runners would have

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Jan Lukavský
I think that this approach breaks the assumption that bundles are executed as immutable pieces of work. This way, runners would have to update the runner while executing it. It is another possible option, but seems to have issues of its own. On 6/26/19 12:28 PM, Robert Bradshaw wrote:

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Robert Bradshaw
Another option, that is nice from an API perspective but places a burden on SDK implementers (and possibly runners), is to maintain the ordering of timers by requiring timers to be fired in order, and if any timers are set to fire them immediately before processing later timers. In other words, if

[DISCUSS] Solving timer ordering on immutable bundles

2019-06-26 Thread Jan Lukavský
Hi, I have mentioned an issue I have come across [1] on several other threads, but it probably didn't attract the attention that it would desire. I will try to restate the problem here for clarity:  - on runners that use concept of bundles (the original issue mentions DirectRunner, but it

Re: Blogpost Beam Summit 2019

2019-06-26 Thread Robert Bradshaw
Thanks. This is a great write-up. +1 to an official tweet. On Wed, Jun 26, 2019, 6:11 AM Reza Rokni wrote: > Thank you for putting this together! > > On Wed, 26 Jun 2019 at 01:23, Ahmet Altay wrote: > >> Thank you for writing and sharing this. I enjoyed reading it :) I think >> it is worth