Re: Adding the streaming project to the main repository

2014-08-19 Thread Fabian Hueske
Thanks for the explanation! Very nice set of features. Looking forward to check it out myself :-) 2014-08-18 21:38 GMT+02:00 Gyula Fóra : > Hey, > > The simple reduce is like what you said yes. But there are also grouped > reduce which you can use by calling .groupBy(keyposition) and then reduce

Re: Adding the streaming project to the main repository

2014-08-18 Thread Henry Saputra
Thanks, I just assign you the issue =) - Henry On Mon, Aug 18, 2014 at 11:38 PM, Márton Balassi wrote: > My username is mbalassi. > I've started watching the issue to give you a link. :) > > > On Tue, Aug 19, 2014 at 8:07 AM, Henry Saputra > wrote: > >> Hi Marton, >> >> I created the JIRA ticke

Re: Adding the streaming project to the main repository

2014-08-18 Thread Márton Balassi
My username is mbalassi. I've started watching the issue to give you a link. :) On Tue, Aug 19, 2014 at 8:07 AM, Henry Saputra wrote: > Hi Marton, > > I created the JIRA ticket to track the streaming documentation: > https://issues.apache.org/jira/browse/FLINK-1058 > Somehow I could not find yo

Re: Adding the streaming project to the main repository

2014-08-18 Thread Henry Saputra
Hi Marton, I created the JIRA ticket to track the streaming documentation: https://issues.apache.org/jira/browse/FLINK-1058 Somehow I could not find your ASF JIRA username. Could you tell me what is your ASF JIRA username? - Henry On Mon, Aug 18, 2014 at 10:29 PM, Márton Balassi wrote: > Sure,

Re: Adding the streaming project to the main repository

2014-08-18 Thread Márton Balassi
Sure, please assign it to me. On Aug 19, 2014 2:44 AM, "Henry Saputra" wrote: > Thanks Stephan. If no one object I will create JIRA ticket as reminder > to add formal documentation for the streaming feature. > > - Henry > > On Mon, Aug 18, 2014 at 11:53 AM, Stephan Ewen wrote: > > The streaming

Re: Adding the streaming project to the main repository

2014-08-18 Thread Henry Saputra
Thanks Stephan. If no one object I will create JIRA ticket as reminder to add formal documentation for the streaming feature. - Henry On Mon, Aug 18, 2014 at 11:53 AM, Stephan Ewen wrote: > The streaming code is in "flink-addons", for new/experimental code. > > Documents should come over the nex

Re: Adding the streaming project to the main repository

2014-08-18 Thread Gyula Fóra
Hey, The simple reduce is like what you said yes. But there are also grouped reduce which you can use by calling .groupBy(keyposition) and then reduce. Also there is reduce for windows: batchReduce and windowReduce batch gives you a sliding window over a predefined number of records, and window r

Re: Adding the streaming project to the main repository

2014-08-18 Thread Fabian Hueske
Hi folks, great work! Looking at the example I have a quick question. What's the semantics of the Reduce operator? I guess its not a window reduce. Is it backed by a hash table and every input tuple updates the hash table and returns the updated value? Cheers, Fabian 2014-08-18 20:53 GMT+02:00

Re: Adding the streaming project to the main repository

2014-08-18 Thread Stephan Ewen
The streaming code is in "flink-addons", for new/experimental code. Documents should come over the next days/weeks, definitely before we make this part of the core. Right now, I would suggest to have a look at some of the examples, to get a feeling for the addon, check for example this here: http

Re: Adding the streaming project to the main repository

2014-08-18 Thread Henry Saputra
Hmm, quick question, I could not find any documentation about the streaming support. Is it part of the source code or will there be additional doc included? - Henry On Mon, Aug 18, 2014 at 10:55 AM, Stephan Ewen wrote: > After the Apache Secretary confirmed that the SGA has arrived and the ICLAs

Re: Adding the streaming project to the main repository

2014-08-18 Thread Henry Saputra
W00t! - Henry On Mon, Aug 18, 2014 at 10:55 AM, Stephan Ewen wrote: > After the Apache Secretary confirmed that the SGA has arrived and the ICLAs > are filed, I have merged the streaming code into the master for the next > release. > > A whole bunch of code that was! > > Great work, all of you.

Re: Adding the streaming project to the main repository

2014-08-18 Thread Stephan Ewen
After the Apache Secretary confirmed that the SGA has arrived and the ICLAs are filed, I have merged the streaming code into the master for the next release. A whole bunch of code that was! Great work, all of you. Looking forward to what this blossoms into... It's a good day, today :-) On Wed,

Re: Adding the streaming project to the main repository

2014-08-08 Thread Gyula Fóra
Okay, lets have a chat about this sometimes, maybe we can come up with something even better. I already talked briefly about this with Stephan about the possibility of runtime adjusting the output buffers but it seemed like the automatic flushing was a far easier and possibly even better choice. (

Re: Adding the streaming project to the main repository

2014-08-08 Thread Ufuk Celebi
Thanks for the detailed explanation! Very nice to hear. :) If your flushing writer does not give you enough control for the trade off (in general you cannot know how large records will be, right?) we can have a chat about runtime changes for this. I would be happy to help with it. In theory it

Re: Adding the streaming project to the main repository

2014-08-08 Thread Gyula Fóra
Hey guys, I might not be able to give you all the details right now, because some of the data is on my colleague's computer, but I'm gonna try :) We have a 30 machine cluster at SZTAKI with 2 cores each not a powerhouse but good for experimenting. We tested both Flink Streaming and Storm with a

Re: Adding the streaming project to the main repository

2014-08-08 Thread Ufuk Celebi
On 08 Aug 2014, at 16:07, Kostas Tzoumas wrote: > Wow! Incredible :-) Can you share more details about the experiments you > ran (cluster setup, jobs, etc)? Same here. :-) I would be especially interested about what you mean with "partly because of the output buffers". Best wishes, Ufuk

Re: Adding the streaming project to the main repository

2014-08-08 Thread Ufuk Celebi
On 08 Aug 2014, at 16:07, Kostas Tzoumas wrote: > Wow! Incredible :-) Can you share more details about the experiments you > ran (cluster setup, jobs, etc)? Same here. :-) I would be especially interested about what you mean with "partly because of the output buffers". Best wishes, Ufuk

Re: Adding the streaming project to the main repository

2014-08-08 Thread Kostas Tzoumas
Wow! Incredible :-) Can you share more details about the experiments you ran (cluster setup, jobs, etc)? On Fri, Aug 8, 2014 at 3:53 PM, Gyula Fóra wrote: > Hey All, > > Quick weekely update on the streaming project: > > It was a good week we implemented a lot of new features and made > consi

Adding the streaming project to the main repository

2014-08-08 Thread Gyula Fóra
Hey All, Quick weekely update on the streaming project: It was a good week we implemented a lot of new features and made considerable work on the api too. Most notably: - Cluster performance was measured against Storm on both simple streaming wordcount and iterative algorithm (pagerank) and Flin

Re: Adding the streaming project to the main repository

2014-08-06 Thread Robert Metzger
Cool. I if we have the confirmation by the secretary (or for foundation members: https://svn.apache.org/repos/private/foundation/officers/iclas.txt), I vote for adding the code to the "master" branch. On Wed, Aug 6, 2014 at 5:18 PM, Stephan Ewen wrote: > The ICLA's and the SGA are cleared as fa

Re: Adding the streaming project to the main repository

2014-08-06 Thread Stephan Ewen
The ICLA's and the SGA are cleared as far as I know. I think we should merge the code into the current master (not the 0.6 release, but the successive one).

Re: Adding the streaming project to the main repository

2014-07-22 Thread Ufuk Celebi
Great. :) On 21 Jul 2014, at 13:49, Stephan Ewen wrote: > I suggested to go for "immutable" by default, because it is less error > prone and gives better initial experience. Mutable objects is a switch for > performance tuning then. I think people agreed with that. Yes, everyone was in favour o

Re: Adding the streaming project to the main repository

2014-07-21 Thread Stephan Ewen
I suggested to go for "immutable" by default, because it is less error prone and gives better initial experience. Mutable objects is a switch for performance tuning then. I think people agreed with that. On Mon, Jul 21, 2014 at 1:48 PM, Stephan Ewen wrote: > Very good! > > We have an initial ef

Re: Adding the streaming project to the main repository

2014-07-21 Thread Stephan Ewen
Very good! We have an initial effort for that as well ( https://github.com/apache/incubator-flink/pull/66), so that aligns very well.

Re: Adding the streaming project to the main repository

2014-07-21 Thread Gyula Fóra
Hey, I have completely reworked the way we managed tuple serialization for streaming. Now it is possible for the user to call .setMutability(true) on an operator to enable object reuse at tuple deserialization. What do you think, what should be the default mutability setting for operators? We use

Re: Adding the streaming project to the main repository

2014-07-16 Thread Márton Balassi
Thanks, Robert: - ZeroMQ - thanks, we have it in another repo - Spark & LGPL - Sean Owen was kind enough to clarify the situation - BTree: The whole org.apache.flink.streaming.index is somewhat legacy code, currently being unused - was for the purpose of state management, but the AP

Re: Adding the streaming project to the main repository

2014-07-16 Thread Sean Owen
On Wed, Jul 16, 2014 at 2:02 PM, Robert Metzger wrote: > "by the way Spark has LGPL licensed packages in its NOTICE" --> did you > find the discussion in their mailing list / JIRA regarding this? Maybe they > contacted the authors of the code or got a special permission to do that? I discussed wi

Re: Adding the streaming project to the main repository

2014-07-16 Thread Robert Metzger
Cool. Thanks for the update. I think you can host the ZeroMQ connectors on a private repository or so. "by the way Spark has LGPL licensed packages in its NOTICE" --> did you find the discussion in their mailing list / JIRA regarding this? Maybe they contacted the authors of the code or got a spec

Re: Adding the streaming project to the main repository

2014-07-16 Thread Márton Balassi
Hi all, We've decided to do our preparations on a fork of the main repo: *https://github.com/mbalassi/incubator-flink/tree/streaming-ready * We've fixed the code to match the coding style and added the modules to the maven build. h

Re: Adding the streaming project to the main repository

2014-07-14 Thread Márton Balassi
Hi guys, @Stefan: Thanks for the script, we've gone through the commits with Gabor, Gyula is reviewing it right now. https://github.com/mbalassi/incubator-flink/commits/streamrebase3 @Robert: We've went through the coding style, the update commit is already pushed to our old repo, I'm merging it

Re: Adding the streaming project to the main repository

2014-07-14 Thread Henry Saputra
@Stephan, yes unfortunately all the individuals who have contributed code need to send his/her ICLAs. Once we resolved the open issues then we ready to merge =) - Henry On Mon, Jul 14, 2014 at 7:58 AM, Stephan Ewen wrote: > Before adding this contribution to the project, there are some legal th

Re: Adding the streaming project to the main repository

2014-07-14 Thread Stephan Ewen
Before adding this contribution to the project, there are some legal things to do: - Obtain ICLAs from all major contributors. There are 7 in the streaming code, out of which three did the largest portion of the work: Márton Balassi, Gyula Fóra, Hermann Gábor - @mentors: Should the other 4 also

Re: Adding the streaming project to the main repository

2014-07-14 Thread Stephan Ewen
Ho guys! I made a scripted manual rebase of each commit (basically add the commit not via its diff, but such that it reflects the code base after the commit) https://github.com/StephanEwen/incubator-flink/commits/streamrebase No more merge commits that mess things up. You should be able to squas

Re: Adding the streaming project to the main repository

2014-07-14 Thread Gyula Fóra
So what we have figured out so far is that git rebasing "straightens" out the history, so all the merges will be omitted and they need to be merged again. Doing this with our 540 regular and 120 merge commits seems a little overkill. In the light of this adding the streaming files as new files to t

Re: Adding the streaming project to the main repository

2014-07-14 Thread Gyula Fóra
Hey, As you have said, our commit history is indeed a little messy in some places especially regarding some duplicate commits. We tried what you suggested to rebase it with git rebase -i , but our problem is that because -i ignores the merge commits, squashing and editing names make pretty much al

Re: Adding the streaming project to the main repository

2014-07-13 Thread Henry Saputra
Thanks for the update Robert. This needs some review so let's wait merging to master or any branch On Sunday, July 13, 2014, Robert Metzger wrote: > Regarding the dependencies, I found that they require "jblas", with this > license: https://github.com/mikiobraun/jblas/blob/master/COPYING > It se

Re: Adding the streaming project to the main repository

2014-07-13 Thread Robert Metzger
Regarding the dependencies, I found that they require "jblas", with this license: https://github.com/mikiobraun/jblas/blob/master/COPYING It seems to be a BSD license, which is compatible with ASF projects [1]. The connectors package depends on RabbitMQ, which is MPL Licensed: http://www.rabbitmq.

Re: Adding the streaming project to the main repository

2014-07-13 Thread Márton Balassi
Thanks for the effort. Sorry for the mess, I'll clean it up as soon as possible. Cheers, Marton On Sun, Jul 13, 2014 at 5:25 PM, Stephan Ewen wrote: > Hi everyone! > > I have found a way to add the code into the main repository in a different > branch, preserving all history. > All code is re

Re: Adding the streaming project to the main repository

2014-07-13 Thread Stephan Ewen
Hi everyone! I have found a way to add the code into the main repository in a different branch, preserving all history. All code is rewritten (with history) to be in "flink-addons/flink-streaming" and the commits are prefixed with [streaming]. https://github.com/StephanEwen/incubator-flink/commits

Re: Adding the streaming project to the main repository

2014-07-13 Thread Stephan Ewen
Good point! I will ping Marton and Gyula for that. On Sun, Jul 13, 2014 at 4:22 PM, Robert Metzger wrote: > Lets see if the variant with rewriting the history using git filter-branch > works better. > > > One other thing regarding the merge: > I'm not sure if we have to do any legal checks prio

Re: Adding the streaming project to the main repository

2014-07-13 Thread Robert Metzger
Lets see if the variant with rewriting the history using git filter-branch works better. One other thing regarding the merge: I'm not sure if we have to do any legal checks prior to merging the changes into our project. Maybe we even need a SGA or CCLA if the code has been written as part of an e

Re: Adding the streaming project to the main repository

2014-07-13 Thread Stephan Ewen
Okay, here is a try: https://github.com/StephanEwen/incubator-flink/tree/streaming/flink-addons/flink-streaming It attributes all files to my commit, but it preseves all authors in git blame. It is a bit strange, the history is broken, but some author information is preserved. Not ideal. Hope we

Re: Adding the streaming project to the main repository

2014-07-13 Thread Márton Balassi
Let us know if we can assist the merge in any way. On Sun, Jul 13, 2014 at 3:50 PM, Stephan Ewen wrote: > Okay. How do we do this, because it is cross-repository merge? I'll look > into Robert's referene... >

Re: Adding the streaming project to the main repository

2014-07-13 Thread Stephan Ewen
Okay, subtree merge looks promising: http://stackoverflow.com/questions/1425892/how-do-you-merge-two-git-repositories I'll give it a try...

Re: Adding the streaming project to the main repository

2014-07-13 Thread Stephan Ewen
Okay. How do we do this, because it is cross-repository merge? I'll look into Robert's referene...

Re: Adding the streaming project to the main repository

2014-07-13 Thread Márton Balassi
Thanks, Stefan & Robert. We'd definitely vote for merging with history as we've invested 4 months of work to reach the current stage. It is also benefitial for Flink as the merge will add 7 contributors to the project then. On Sun, Jul 13, 2014 at 3:38 PM, Robert Metzger wrote: > I think it is

Re: Adding the streaming project to the main repository

2014-07-13 Thread Robert Metzger
I think it is also possible to merge the streaming project keeping its history: http://git-scm.com/book/en/Git-Tools-Subtree-Merging. I saw this recently in Optiq's JIRA. They are doing something like: git subtree add --prefix=example-csv https://github.com/julianhyde/optiq-csv.git master On S

Re: Adding the streaming project to the main repository

2014-07-13 Thread Stephan Ewen
Hi folks! I have made a version that added the code to the flink repository. The thing is: all code is attributed to me (as the one who added the files). If you do not mind, I can commit it like that. If you want to code to be attributed to you, you need to make a pull request that puts the cont

Re: Adding the streaming project to the main repository

2014-07-12 Thread Stephan Ewen
Very nice, thanks! I'll try and merge the current state under "flink-addons/flink-streaming" today. Stephan On Fri, Jul 11, 2014 at 7:15 PM, Gyula Fóra wrote: > Hey, > > The package names in the streaming code have now be renamed to the proper > flink package names and it uses the latest fli

Adding the streaming project to the main repository

2014-07-11 Thread Gyula Fóra
Hey, The package names in the streaming code have now be renamed to the proper flink package names and it uses the latest flink snapshot as its dependencies. Also, support for iterative streaming jobs have been added to the API and we have also added support for directed emits to match the functi

Re: Adding the streaming project to the main repository

2014-07-07 Thread Gyula Fóra
The utilites that we used for performance measurements have no direct connections to this project. We thought it would make sense to move them out into a separate repo since we are constantly modifying the settings for the actual tests. On Mon, Jul 7, 2014 at 2:30 PM, Ufuk Celebi wrote: > > On

Re: Adding the streaming project to the main repository

2014-07-07 Thread Ufuk Celebi
On 07 Jul 2014, at 12:06, Márton Balassi wrote: > Yeah, this might be slightly confusing - for clarifying the situation: > > > - Right under the streaming-addons one can find basic connectors for > message queue services - at the moment Kafka and RabbitMQ. We considered > this "classical

Re: Adding the streaming project to the main repository

2014-07-07 Thread Stephan Ewen
I like the name "connectors"

Re: Adding the streaming project to the main repository

2014-07-07 Thread Márton Balassi
Yeah, this might be slightly confusing - for clarifying the situation: - Right under the streaming-addons one can find basic connectors for message queue services - at the moment Kafka and RabbitMQ. We considered this "classical" addon functionality. - Additionally the job used for pe

Re: Adding the streaming project to the main repository

2014-07-07 Thread Stephan Ewen
Hi! Thanks for the update! >From y side, +1 for adding the code. One question though: What part of the code is in your "addons" project? I am wondering if that may cause confusion, because (as per the discussion via hangout last week), we want to add the streaming code initially to the "flink-ad

Re: Adding the streaming project to the main repository

2014-07-04 Thread Gyula Fóra
Hey, I would like to give a quick update on the status of the flink streaming project; all of our dependencies are now updated to the current 0.6-snapshot in our main branch, and the project is now decomposed into 3 subprojects: core, examples, and addons. We have created a separate branch for ou

Fwd: Adding the streaming project to the main repository

2014-07-02 Thread Márton Balassi
To extend the functionality of Flink a separate branch of development was dedicated for low latency, distributed stream processing support. The development started during March of 2014 and is approaching a state where it might be considered a candidate for becoming part of the main repository. As