Re: (Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-04-29 Thread Seetharam Venkatesh
In my experience, G+Hangouts has a limitation of 10 attendees which is lame. You could explore zoom or some free conferencing service. Thanks! On Fri, Apr 29, 2016 at 9:49 PM Frances Perry wrote: > Yes -- will definitely aim to present it in a way that newbies can learn > and participate! > > O

Re: (Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-04-29 Thread Frances Perry
Yes -- will definitely aim to present it in a way that newbies can learn and participate! On Fri, Apr 29, 2016 at 9:42 PM, Yash Sharma wrote: > +1 for google hangout. > Will the discussion be useful for newbie contributors? > > - Thanks, via mobile, excuse brevity. > On Apr 30, 2016 2:39 PM, "F

Re: (Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-04-29 Thread Yash Sharma
+1 for google hangout. Will the discussion be useful for newbie contributors? - Thanks, via mobile, excuse brevity. On Apr 30, 2016 2:39 PM, "Frances Perry" wrote: > As discussed earlier this month, we're going to try a virtual Beam meeting > for anyone who is interested in joining. > > *When:*

(Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-04-29 Thread Frances Perry
As discussed earlier this month, we're going to try a virtual Beam meeting for anyone who is interested in joining. *When:* Wednesday 5/4 at 8am PDT *Where: *Google Hangouts ok by folks? Alternative suggestions? *Agenda: *Here's a list to get us started. Please suggest other things you'd like to

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Frances Perry
> > @Frances Sources are not simple DoFns. They add additional > functionality, e.g. checkpointing, watermark generation, creating > splits. If we want sinks to be portable, we should think about a > dedicated interface. At least for the checkpointing. > We might be mixing sources and sinks in thi

Jenkins build is back to normal : beam_Release_NightlySnapshot #25

2016-04-29 Thread Apache Jenkins Server
See

Re: Build failed in Jenkins: beam_Release_NightlySnapshot #24

2016-04-29 Thread Jean-Baptiste Onofré
Thx ! On 04/29/2016 06:51 PM, Davor Bonaci wrote: Known breakage from yesterday. Restarted. On Fri, Apr 29, 2016 at 12:11 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: See

Re: Build failed in Jenkins: beam_Release_NightlySnapshot #24

2016-04-29 Thread Davor Bonaci
Known breakage from yesterday. Restarted. On Fri, Apr 29, 2016 at 12:11 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > > > Changes: > > [tgroh] Add CommittedResult > > [tgroh] Stop cloning coders in th

Build failed in Jenkins: beam_Release_NightlySnapshot #24

2016-04-29 Thread Apache Jenkins Server
See Changes: [tgroh] Add CommittedResult [tgroh] Stop cloning coders in the InProcessRunner [swegner] Consolidate checkstyle configuration in new 'build-tools' module -- [...truncate

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Maximilian Michels
@Raghu Thanks for the explanation. I had already realized after Aljoscha's comment that Kafka enforces this data model. I was too much in the Flink land, where we usually only use the value part (although it its also possible to set the key). Good to hear you agree on factoring out the watermark/ti

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Raghu Angadi
On Fri, Apr 29, 2016 at 2:11 AM, Maximilian Michels wrote: > Further, the KafkaIO enforces a data model which AFAIK is > not enforced by the Beam model. I don't know the details for this > design decision but I would like this to be communicated before it is > merged into the master. > structu

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Raghu Angadi
I agree with the sentiment here. Please note that runner specific sources continue to work as they do now. The original question was about '.useNative()' which requires generic to Beam sources to interact with the specific routers (on those lines). On Fri, Apr 29, 2016 at 2:11 AM, Maximilian Miche

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Frances Perry
+Dan Halperin (who is OOO for a couple of days) Yes, there are plans for unbounded sinks. But unlike sources, sinks don't add any additional functionality beyond a ParDo (they just make it more obvious how to use a ParDo appropriately to get the right fault tolerance). So they haven't been priorit

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Jean-Baptiste Onofré
Hi Amit, yes, definitely, the highest priority is that the existing runners have to fully work with Beam IO. I will work with you on the Spark runner about that. Regards JB On 04/29/2016 01:06 PM, Amit Sela wrote: +1 on Max's comment on active discussion for connector API. I think that we c

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Jean-Baptiste Onofré
Hi Max, Understood. It remembers me a discussion we had with Dan. Regards JB On 04/29/2016 12:44 PM, Maximilian Michels wrote: @Aljoscha I didn't know that Kafka always stores Key/Value but I see that we also have support for setting Kafka keys in Flink. @JB I get your point that a sink is si

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Amit Sela
+1 on Max's comment on active discussion for connector API. I think that we can use Spark and Flink's existing connectors (Spark supports many) as test cases and consider a bottom-up design approach rather than top-down, especially since we're in incubation. Also +1 for Davor that runners should m

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Maximilian Michels
@Aljoscha I didn't know that Kafka always stores Key/Value but I see that we also have support for setting Kafka keys in Flink. @JB I get your point that a sink is simply a DoFn, but a ParDo is not a good match for a sink. A Sink doesn't produce a PCollection but represents the end of a pipeline.

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Jean-Baptiste Onofré
Hi, KafkaIO uses KafkaRecord which is basically key,value + some metadata (topic, partition, offset). Can you describe the behavior of an UnboundedSink ? UnboundedSource is obvious: it's still consuming data creating PCollection sent into the pipeline. But UnboundedSink ? Do you mean that

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Aljoscha Krettek
Hi, I think the fact that KafkaIO has a model comes from Kafka having a model. I imagine most sources will emit the type of values appropriate for them. I agree with Max that the lack of an UnboundedSink seems strange. Do we have any "sinks" implemented as a ParDo already? Cheers, Aljoscha On

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Jean-Baptiste Onofré
Hi Max, Your four points are valid and we already discussed about that. 1. +1, the runner API should bring utils around that 2. UnboundedSink has been discussed (I don't see strong use case for now, as it takes a PCollection). 3. +1, Dan talked about improving the hierarchy. 4. +1, I'm working

Re: [DISCUSS] Beam IO &runners native IO

2016-04-29 Thread Maximilian Michels
@Amir: This is the Developer mailing list. Please post your questions regarding Beam on the user mailing list. +1 for portability in general. However, I see some crucial TODOs coming up: 1) Improving the integration of Runners with the Beam sink/source API 2) Providing interfaces to implement new