Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-14 Thread Gleb Kanterov
Congratulations! On Fri, Nov 15, 2019 at 5:44 AM Valentyn Tymofieiev wrote: > Congratulations, Brian! > > On Thu, Nov 14, 2019 at 6:25 PM jincheng sun > wrote: > >> Congratulation Brian! >> >> Best, >> Jincheng >> >> Kyle Weaver 于2019年11月15日周五 上午7:19写道: >> >>> Thanks for your contributions

Re: On processing event streams

2019-11-14 Thread Kenneth Knowles
On Tue, Nov 12, 2019 at 1:36 AM Jan Lukavský wrote: > Hi, > > this is follow up of multiple threads covering the topic of how to (in a > unified way) process event streams. Event streams can be characterized > by a common property that ordering of events matter. 1. events are ordered (hence

slides?

2019-11-14 Thread Austin Bennett
Hi Dev and User, Wondering if people would find a benefit from collecting slides from Meetups/Talks? Seems that this could be appropriate on the website, for instance. Not sure whether this has been asked previously, so bringing it to the group. Cheers, Austin

Re: Why is Pipeline not Serializable and can it be changed to be Serializable

2019-11-14 Thread Pulasthi Supun Wickramasinghe
Hi Luke, That is the approach i am taking currently to handle the functions. I Might have to do the same for Coders as well since some coders have the same issue of not having default constructors. I also initially considered converting the pipeline into a JSON format and sending that over to

Re: Proposal: @RequiresTimeSortedInput

2019-11-14 Thread Kenneth Knowles
Hi Jan, Sorry for the very slow reply. Your proposed feature is sensitive to all data that is not in timestamp order, which is not the same as late. In Beam "late" is defined as "assigned to a window where the watermark has passed the end of the window and a 'final' aggregate has been produced".

Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-14 Thread Valentyn Tymofieiev
Congratulations, Brian! On Thu, Nov 14, 2019 at 6:25 PM jincheng sun wrote: > Congratulation Brian! > > Best, > Jincheng > > Kyle Weaver 于2019年11月15日周五 上午7:19写道: > >> Thanks for your contributions and congrats Brian! >> >> On Thu, Nov 14, 2019 at 3:14 PM Kenneth Knowles wrote: >> >>> Hi all,

Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-14 Thread jincheng sun
Congratulation Brian! Best, Jincheng Kyle Weaver 于2019年11月15日周五 上午7:19写道: > Thanks for your contributions and congrats Brian! > > On Thu, Nov 14, 2019 at 3:14 PM Kenneth Knowles wrote: > >> Hi all, >> >> Please join me and the rest of the Beam PMC in welcoming a new committer: >> Brian

Re: Cleaning up Approximate Algorithms in Beam

2019-11-14 Thread Robert Bradshaw
On Thu, Nov 14, 2019 at 1:06 AM Kenneth Knowles wrote: > Wow. Nice summary, yes. Major calls to action: > > 0. Never allow a combiner that does not include the format of its state > clear in its name/URN. The "update compatibility" problem makes their > internal accumulator state essentially

Re: Python Precommit duration pushing 2 hours

2019-11-14 Thread Robert Bradshaw
On Thu, Nov 14, 2019 at 2:58 PM Ahmet Altay wrote: > > On Thu, Nov 14, 2019 at 2:55 PM Mikhail Gryzykhin wrote: >> >> Hi Everyone, >> >> Python precommit phrase timeouts for (roughly) 80% of the jobs in 2 hours. >> This also blocks release branch validation. I suggest to bump the timeout to >>

Re: Date/Time Ranges & Protobuf

2019-11-14 Thread Luke Cwik
The timestamps flow both ways since: * IO authors are responsible for saying what the watermark timestamp is and stateful DoFns also allow for users to set timers in relative and processing time domains. * Runner authors need to understand and merge these timestamps together to compute what the

Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-14 Thread Kyle Weaver
Thanks for your contributions and congrats Brian! On Thu, Nov 14, 2019 at 3:14 PM Kenneth Knowles wrote: > Hi all, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Brian Hulette > > Brian introduced himself to dev@ earlier this year and has been > contributing

Re: Date/Time Ranges & Protobuf

2019-11-14 Thread Sam Rohde
My two cents are we just need a proto representation for timestamps and durations that includes units. The underlying library can then determine what to do with it. Then further, we can have a standard across Beam SDKs and Runners of how to interpret the proto. Using a raw int64 for timestamps and

[ANNOUNCE] New committer: Brian Hulette

2019-11-14 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Brian Hulette Brian introduced himself to dev@ earlier this year and has been contributing since then. His contributions to Beam include explorations of integration with Arrow, standardizing coders, portability for

Re: Python Precommit duration pushing 2 hours

2019-11-14 Thread Ahmet Altay
On Thu, Nov 14, 2019 at 2:55 PM Mikhail Gryzykhin wrote: > Hi Everyone, > > Python precommit phrase timeouts for (roughly) 80% of the jobs in 2 hours. > This also blocks release branch validation. I suggest to bump the timeout > to 3 hours while we are working on a proper solution. This way many

Re: Python Precommit duration pushing 2 hours

2019-11-14 Thread Mikhail Gryzykhin
Hi Everyone, Python precommit phrase timeouts for (roughly) 80% of the jobs in 2 hours. This also blocks release branch validation. I suggest to bump the timeout to 3 hours while we are working on a proper solution. This way many people can get unblocked. I believe the change can be rather

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
On 11/14/19 9:50 PM, Daniel Robert wrote: Alright, thanks everybody. I'm really appreciative of the conversation here. I think I see where my disconnect is and how this might all work together for me. There are some bugs in the current rabbit implementation that I think have confused my

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Reuven Lax
Immediately after a source, the window is the Global window, which means you will get global deduplication. On Thu, Nov 14, 2019 at 12:50 PM Daniel Robert wrote: > Alright, thanks everybody. I'm really appreciative of the conversation > here. I think I see where my disconnect is and how this

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Daniel Robert
Alright, thanks everybody. I'm really appreciative of the conversation here. I think I see where my disconnect is and how this might all work together for me. There are some bugs in the current rabbit implementation that I think have confused my understanding of the intended semantics. I'm

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
Just as a matter of curiosity, I wonder why it would be needed to assign a (local) UUIDs to RabbitMQ streams. There seem to be only two options:  a) RabbitMQ does not support restore of client connection (this is valid, many sources work like that, e.g. plain websocket, or UDP stream)  b) it

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Eugene Kirpichov
Hi Daniel, On Wed, Nov 13, 2019 at 8:26 PM Daniel Robert wrote: > I believe I've nailed down a situation that happens in practice that > causes Beam and Rabbit to be incompatible. It seems that runners can and do > make assumptions about the serializability (via Coder) of a CheckpointMark. > >

Re: Why is Pipeline not Serializable and can it be changed to be Serializable

2019-11-14 Thread Luke Cwik
You should create placeholders inside of your Twister2/OpenMPI implementation that represent these functions and then instantiate actual instances of them on the workers if you want to write your own pipeline representation and format for OpenMPI/Twister2. Or consider converting the pipeline to

Re: Completeness of Beam Java Dependency Check Report

2019-11-14 Thread Kenneth Knowles
On Thu, Nov 14, 2019 at 8:04 AM Alexey Romanenko wrote: > Good example about Guava deps, let me go a bit deeper. > > $ find . -name build.gradle | xargs grep library.java.guava > ./sdks/java/core/build.gradle: shadowTest library.java.guava_testlib > > ./sdks/java/io/kinesis/build.gradle:

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Reuven Lax
Just a thought: instead of embedding the RabbitMQ streams inside the checkpoint mark, could you keep a global static map of RabbitMQ streams keyed by a unique UUID. Then all you have to serialize inside the CheckpointMark is the UUID; you can look up the actual stream in the constructor of the

Re: Wiki access

2019-11-14 Thread Thomas Weise
Done, you should be all set. On Thu, Nov 14, 2019 at 9:57 AM Elliotte Rusty Harold wrote: > Hello, > > May I please have access to edit the Wiki? username is elharo > > Thanks. > > -- > Elliotte Rusty Harold > elh...@ibiblio.org >

Wiki access

2019-11-14 Thread Elliotte Rusty Harold
Hello, May I please have access to edit the Wiki? username is elharo Thanks. -- Elliotte Rusty Harold elh...@ibiblio.org

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
Hi, answers inline. On 11/14/19 4:15 PM, Daniel Robert wrote: We may be talking past each other a bit, though I do appreciate the responses. Rabbit behaves a lot like a relational database in terms of state required. A connection is analogous to a database connection, and a channel (poor

Re: Completeness of Beam Java Dependency Check Report

2019-11-14 Thread Alexey Romanenko
Good example about Guava deps, let me go a bit deeper. > $ find . -name build.gradle | xargs grep library.java.guava > ./sdks/java/core/build.gradle: shadowTest library.java.guava_testlib > ./sdks/java/io/kinesis/build.gradle: testCompile library.java.guava_testlib Regarding using

Re: [discuss] Using a logger hierarchy in Python

2019-11-14 Thread Thomas Weise
Awesome, thanks Chad! On Wed, Nov 13, 2019 at 10:26 PM Chad Dombrova wrote: > Hi Thomas, > > >> Will this include the ability for users to configure logging via pipeline >> options? >> > > We're working on a proposal to allow pluggable logging handlers that can > be configured via pipeline

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Daniel Robert
We may be talking past each other a bit, though I do appreciate the responses. Rabbit behaves a lot like a relational database in terms of state required. A connection is analogous to a database connection, and a channel (poor analogy here) is similar to an open transaction. If the

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Jan Lukavský
Hi, as I said, I didn't dig too deep into that, but what I saw was [1].Generally, if RabbitMQ would have no way to recover subscription (which I don't think is the case), then it would not be incompatible with beam, but actually with would be incompatible any fault tolerant semantics.[1] 

Re: RabbitMQ and CheckpointMark feasibility

2019-11-14 Thread Daniel Robert
On 11/14/19 2:32 AM, Jan Lukavský wrote: Hi Danny, as Eugene pointed out, there are essentially two "modes of operation" of CheckpointMark. It can:  a) be used to somehow restore state of a reader (in call to UnboundedSource#createReader)  b) confirm processed elements in

Re: Triggers still finish and drop all data

2019-11-14 Thread Kenneth Knowles
On Fri, Nov 8, 2019 at 9:44 AM Steve Niemitz wrote: > Yeah that looks like what I had in mind too. I think the most useful > notification output would be a KV of (K, summary)? > Sounds about right. Some use cases may not care about the summary, but just the notification. But for most runners

Re: [CANCELLED] [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

2019-11-14 Thread Kenneth Knowles
Hi Jan, I want to acknowledge your careful consideration of the community here. I myself have simply not had the time to dedicate to considering this proposal. So, like Max, I would have a bit of an "outside" perspective so would hesitate to cast any sort of vote. I think you have chosen a good

Re: Cleaning up Approximate Algorithms in Beam

2019-11-14 Thread Kenneth Knowles
Wow. Nice summary, yes. Major calls to action: 0. Never allow a combiner that does not include the format of its state clear in its name/URN. The "update compatibility" problem makes their internal accumulator state essentially part of their public API. Combiners named for what they do are an