Re: Meet up at ApacheCon Seville

2016-11-14 Thread Sergio Fernández
I've just discussed it with Ismael: maybe we can organize a Beam BoF: https://apachebigdataeu2016.sched.org/event/8giP What do you think? On Sun, Nov 13, 2016 at 10:00 PM, Neelesh Salian wrote: > Anyone here today? > > On Nov 11, 2016 12:01 PM, "Stephan Ewen" wrote: > > > I'll also be in Sevill

Re: Meet up at ApacheCon Seville

2016-11-14 Thread Jean-Baptiste Onofré
Hi Sergio, good idea ! And it seems you already registered: great ! There are already some Beamers today ;) Thanks ! Regards JB On 11/14/2016 12:48 PM, Sergio Fernández wrote: I've just discussed it with Ismael: maybe we can organize a Beam BoF: https://apachebigdataeu2016.sched.org/event/8gi

Re: [PROPOSAL] Change to KafkaIO splits

2016-11-14 Thread Amit Sela
For Kafka, I don't think you're over-splitting if you split according to Kafka partitions. If your backend provides enough parallelism, you'll get a 1-1 (Source splits-to-Kafka partitions) parallelism from the KafkaIO today. The problem is with the backend not providing enough parallelism: - C

ApacheCon: Apache Beam BoF and Beam diner

2016-11-14 Thread Jean-Baptiste Onofré
Following Sergio's idea, we added a BoF space: Apache Beam and You! Please, if you want to discuss about Beam, have some details, mostly community oriented, don't hesitate to join ! See you tonight. On the other hand, we plan to have an informal Apache Beam diner tomorrow evening. If you wan

Jenkins build is back to stable : beam_Release_NightlySnapshot #231

2016-11-14 Thread Apache Jenkins Server
See

Jenkins build is back to normal : beam_SeedJob_Website #14

2016-11-14 Thread Apache Jenkins Server
See

回复:Verify a new Runner

2016-11-14 Thread 李劲松(之信)
Thanks a lot Kenn for the information! They are very helpful. We have a plan now and are working on the test... will let you know if we encounter any problem. Thank you! Best, Zhixin --发件人:Kenneth Knowles 发送时间:2016年11月8日(星期二) 12:11

Re: [PROPOSAL] Change to KafkaIO splits

2016-11-14 Thread Raghu Angadi
I agree with all of this, except I think this also avoids the need to "remember" the original number of > parallelism. KafkaIO still need to decide how many splits it needs to return in generateInitialSplits(). 'Update' could be Dataflow specific concern. We could drop it for this thread, thoug

Re: [PROPOSAL] Change to KafkaIO splits

2016-11-14 Thread Raghu Angadi
On Sun, Nov 13, 2016 at 10:14 PM, Davor Bonaci wrote: > Luke is bringing up great questions, I think. > Yes, better handling of 'desiredNumSplits' by a runner would be very useful. I wanted to limit my proposal to what a source like KafkaIO could do on its own. > My first impression is that th

Batcher DoFn

2016-11-14 Thread Josh Cogan
Hi Dev, After offline discussions with Gus, I'd like propose we include a Batcher function into contrib/. This would be a DoFn that behaves like this: [1,2,3,4,5] -> Batcher(max_size=2) -> [[1,2],[3,4],[5]] Its simple code, but it also shows off that values can still be yielded from finish_bund

Re: KafkaIO.read withTimestampFn

2016-11-14 Thread Raghu Angadi
Hi Kobi, Missed this earlier. Could you describe how it is limiting you? The timestampFn depends on the type of key and value. Thats why once you set it, we don't allow modifying keyCoder() or valueCoder() etc. e.g. scenario we want to avoid (if we didn't have this restriction) : KafakIO.read()

Re: Batcher DoFn

2016-11-14 Thread Kenneth Knowles
Hi Josh, I think you probably mean something like buffering elements in a field on the DoFn, emitting batches as appropriate, and emitting the remainder in finishBundle. Unfortunately there are two issues: - in the presence of windowing the DoFn might be invoked in different windows, so you'll