On Tue, 25 Jun 2019 at 21:20, Jan Lukavský wrote:
>
> On 6/25/19 1:43 PM, Reza Rokni wrote:
>
>
>
> On Tue, 25 Jun 2019 at 18:12, Jan Lukavský wrote:
>
>> > The TTL check would be in the same Timer rather than a separate Timer.
>> The max value processed in each OnTimer call would be stored in
The use case of a transform waiting for a SInk or Sinks to complete is
very interesting indeed!
Curious, if a sink internally makes use of a Global Window with processing
time triggers to push its writes, what mechanism could be used to release a
transform waiting for a signal from the Sink(s)
On Thu, Jun 27, 2019 at 1:52 AM Rui Wang wrote:
>>
>>
>> AFAIK all streaming runners today practically do provide these panes in
>> order;
>
> Does it refer to "the stage immediately after GBK itself processes fired
> panes in order" in streaming runners? Could you share more information?
>
>
>
>
> AFAIK all streaming runners today practically do provide these panes in
> order;
>
Does it refer to "the stage immediately after GBK itself processes fired
panes in order" in streaming runners? Could you share more information?
> this means that likely many users implicitly rely on this
On Wed, Jun 26, 2019 at 4:22 PM Anton Kedin wrote:
> Currently our spotless is configured globally [1] (for java at least) to
> include all source files by '**/*.java'. And then we exclude things
> explicitly. Don't know why, but these exclusions are ignored for me
> sometimes, for example
Currently our spotless is configured globally [1] (for java at least) to
include all source files by '**/*.java'. And then we exclude things
explicitly. Don't know why, but these exclusions are ignored for me
sometimes, for example `./gradlew :sdks:java:core:spotlessJavaCheck` always
fails when
Correct, however I think our triggering model is close to useless (or at
least close to unusable) without such a guarantee, for both accumulating
and discarding. What's worse - AFAIK all streaming runners today
practically do provide these panes in order; this means that likely many
users
Thanks! That thread was really helpful!
-Rui
On Wed, Jun 26, 2019 at 1:18 PM Steve Niemitz wrote:
> There was a thread about this a few months ago as well:
>
> https://lists.apache.org/thread.html/20d11046d26174969ef44a781e409a1cb9f7c736e605fa40fdf98397@%3Cuser.beam.apache.org%3E
>
>
> On Wed,
There was a thread about this a few months ago as well:
https://lists.apache.org/thread.html/20d11046d26174969ef44a781e409a1cb9f7c736e605fa40fdf98397@%3Cuser.beam.apache.org%3E
On Wed, Jun 26, 2019 at 4:02 PM Robert Bradshaw wrote:
> There is no promise that panes will arrive in order
There is no promise that panes will arrive in order (especially the
further you get "downstream"). Though they may be approximately so,
it's dangerous to assume that. You can inspect the sequential index in
PaneInfo to determine whether a pane is older than other panes you
have seen.
On Wed, Jun
Regarding Python, yes and no. Python doesn't distinguish at compile
time between (1), (2), and (6), but that doesn't mean it isn't part of
the public API and people might start counting on it, so it's in some
sense worse. We can also do (3) (which is less cumbersome in Python,
either returning a
In lieu of doing a migration to pytest, which is a large effort, I'm trying
to do the same using nose.
Opened https://issues.apache.org/jira/browse/BEAM-7641
On Tue, Jun 25, 2019 at 4:01 PM Udi Meiri wrote:
> I was thinking that our test infrastructure could use an upgrade to pytest.
>
> Some
Hi Łukasz,
See answers inline.
Regard,
Mikhail.
On Wed, Jun 26, 2019 at 7:47 AM Łukasz Gajowy wrote:
> Hi Mikhail!
>
> Together with Kamil we're investigating the possibilities of creating
> alerts for anomalies for the metrics collected from various tests (load, IO
> tests, other performance
BTW regarding Python SDK, I think the answer to this question is simpler
for Python SDK due to the lack of types. Most examples I know just return a
PCollection from the Write transform which may or may not be ignored by
users. If the PCollection is used, the user should be aware of the element
Hi Community,
I am trying to understand Beam model and having a question related to
accumulating mode and panes:
Accumulating mode means every time when a trigger fires, it emits all
values seen so far in a window(so it's called accumulating), an example
from Beam programming model guide[1] sets
On Wed, Jun 26, 2019 at 5:46 AM Robert Bradshaw wrote:
> Good question.
>
> I'm not sure what could be done with (5) if it contains no deferred
> objects (e.g there's nothing to wait on).
>
> There is also (6) return PCollection. The
> advantage of (2) is that one can migrate to (1) or (6)
Hi Mikhail!
Together with Kamil we're investigating the possibilities of creating
alerts for anomalies for the metrics collected from various tests (load, IO
tests, other performance tests). This is unfortunately impossible to do in
Perfkit explorer tool that we're using for displaying the
On Sat, Jun 22, 2019 at 1:09 AM Valentyn Tymofieiev wrote:
>
> On Tue, Jun 18, 2019 at 2:01 PM Ahmet Altay wrote:
>>
>> Thank you for the update, very helpful. It might be worthwhile to share a
>> version of this with user mailing list after 2.14.
>
>
> I think so too, we can send an update to
Earlier than the input watermark only applies to event time timers, but the
above problem holds for processing time timers as well.
On Wed, Jun 26, 2019, 1:50 PM Robert Bradshaw wrote:
> Yeah, it wouldn't be optimal performance-wise, but I think it's good
> to keep the bar for a correct SDK
That sounds very interesting. I will update the JIRA with link to this
discussion and then I will have a look if this can be easily implemented
in the DirectRunner.
Thanks for this discussion!
On 6/26/19 2:50 PM, Robert Bradshaw wrote:
Yeah, it wouldn't be optimal performance-wise, but I
Yeah, it wouldn't be optimal performance-wise, but I think it's good
to keep the bar for a correct SDK low. Might still be better than
sending one timer per bundle, and you only pay the performance if
timers are set earlier than the input watermark (and there was a timer
firing in this range).
Good question.
I'm not sure what could be done with (5) if it contains no deferred
objects (e.g there's nothing to wait on).
There is also (6) return PCollection. The
advantage of (2) is that one can migrate to (1) or (6) without
changing the public API, while giving something to wait on without
This would have a lot of performance problems (especially since there is
user code that caches within a bundle, and invalidates the cache at the end
of every bundle). However this would be a valid "lazy" implementation.
On Wed, Jun 26, 2019 at 2:29 PM Robert Bradshaw wrote:
> Note also that a
Note also that a "lazy" SDK implementation would be to simply return
all the timers (as if they were new timers) to runner once a timer set
(before or at the last requested timer in the bundle) is encountered.
E.g. Suppose we had timers T1, T3, T5 in the bundle. On firing T1, we
set T2 and delete
Beam introduced in version 2.4.0 the Wait transform to delay
processing of each window in a PCollection until signaled. This opened
new interesting patterns for example writing to a database and when
‘fully’ done write to another database.
To support this pattern an IO connector Write transform
I like this option the best. It might be trickier to implement, but seems
like it would be the most consistent solution.
Another problem it would solve is the following: let's say a bundle arrives
containing timers T1 and T2, and while processing T1 the user code deletes
T2 (or resets it to a
Don't get me wrong, I think that this would be the best option, I just
discarded it at the beginning. But maybe it can be done, but I'm not
able to tell. At least what I saw in DirectRunner, I think that it works
so that first timers are extracted from TimerInternals and then
executed,
Bundles would still be immutable pieces of work. E.g. in this case, T2
would never be sent to the runner.
On Wed, Jun 26, 2019 at 1:02 PM Jan Lukavský wrote:
>
> I think that this approach breaks the assumption that bundles are
> executed as immutable pieces of work. This way, runners would have
I think that this approach breaks the assumption that bundles are
executed as immutable pieces of work. This way, runners would have to
update the runner while executing it. It is another possible option, but
seems to have issues of its own.
On 6/26/19 12:28 PM, Robert Bradshaw wrote:
Another option, that is nice from an API perspective but places a
burden on SDK implementers (and possibly runners), is to maintain the
ordering of timers by requiring timers to be fired in order, and if
any timers are set to fire them immediately before processing later
timers. In other words, if
Hi,
I have mentioned an issue I have come across [1] on several other
threads, but it probably didn't attract the attention that it would desire.
I will try to restate the problem here for clarity:
- on runners that use concept of bundles (the original issue mentions
DirectRunner, but it
Thanks. This is a great write-up. +1 to an official tweet.
On Wed, Jun 26, 2019, 6:11 AM Reza Rokni wrote:
> Thank you for putting this together!
>
> On Wed, 26 Jun 2019 at 01:23, Ahmet Altay wrote:
>
>> Thank you for writing and sharing this. I enjoyed reading it :) I think
>> it is worth
32 matches
Mail list logo