Re: parse json-file with scala-api and json4s

2014-08-08 Thread Aljoscha Krettek
Hi, I think it is a good way, yes. You could also handle the JSON parsing in a custom input format but this would only shift the computation to a different place. Performance should not be impacted by this. (I think parsing JSON is slow no matter what you do and not matter what cluster processing f

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Henry Saputra
We could add the "apache" prefix for the release artifacts. As Stephan had mentioned, Robert, could you kindly send cancel vote email to the the RC2 vote thread by prefixing [CANCEL] in the subject line? This should help keep track of different RC vote threads to vote against. Thanks, - Henry

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Stephan Ewen
I vote to include the fix to FLINK-909 and create RC3.

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Sean Owen
I would omit 'apache' just because that's what I see other Apache projects do. You're right that legal mentioned it would be best in this instance to show use of "Flink" alone as a trademark, and this helps that goal, although I think it's possible to achieve this otherwise. Yes in general though r

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Alan Gates
I am not aware of any requirement that the jar have apache in the name. Alan. Ufuk Celebi August 8, 2014 at 1:39 AM I've changed my opinion. I thought about this further and think we should stick to the version *without* the prefix. The legal team asked for a r

Re: Adding the streaming project to the main repository

2014-08-08 Thread Gyula Fóra
Okay, lets have a chat about this sometimes, maybe we can come up with something even better. I already talked briefly about this with Stephan about the possibility of runtime adjusting the output buffers but it seemed like the automatic flushing was a far easier and possibly even better choice. (

Re: Adding the streaming project to the main repository

2014-08-08 Thread Ufuk Celebi
Thanks for the detailed explanation! Very nice to hear. :) If your flushing writer does not give you enough control for the trade off (in general you cannot know how large records will be, right?) we can have a chat about runtime changes for this. I would be happy to help with it. In theory it

Re: Adding the streaming project to the main repository

2014-08-08 Thread Gyula Fóra
Hey guys, I might not be able to give you all the details right now, because some of the data is on my colleague's computer, but I'm gonna try :) We have a 30 machine cluster at SZTAKI with 2 cores each not a powerhouse but good for experimenting. We tested both Flink Streaming and Storm with a

Re: Guide/ tips to do preferred development for Flink?

2014-08-08 Thread Henry Saputra
Thanks Ufuk, yeah originally I wanted to start local standalone dev env without actually going to target directory to start the server. But I will use local environment for dev for now to get more familiar about the flow. Thanks guys! - Henry On Fri, Aug 8, 2014 at 12:36 AM, Ufuk Celebi wrote:

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Henry Saputra
Hi Robert, No need for prefix IMHO, makes the names shorter and always assume Flink releases will be under ASF On Fri, Aug 8, 2014 at 12:36 AM, Robert Metzger wrote: > I think we should create another release candidate since Stephan provided a > fix for a crucial bug here: > https://github.com/a

Re: parse json-file with scala-api and json4s

2014-08-08 Thread Norman Spangenberg
Hello Aljoscha, Thanks for your reply. It was really helpful. After some time to figure out the right syntax it worked perfectly. val user_interest = lines.map( line => { val parsed = parse(line) implicit lazy val formats = org.

Re: Adding the streaming project to the main repository

2014-08-08 Thread Ufuk Celebi
On 08 Aug 2014, at 16:07, Kostas Tzoumas wrote: > Wow! Incredible :-) Can you share more details about the experiments you > ran (cluster setup, jobs, etc)? Same here. :-) I would be especially interested about what you mean with "partly because of the output buffers". Best wishes, Ufuk

Re: Adding the streaming project to the main repository

2014-08-08 Thread Ufuk Celebi
On 08 Aug 2014, at 16:07, Kostas Tzoumas wrote: > Wow! Incredible :-) Can you share more details about the experiments you > ran (cluster setup, jobs, etc)? Same here. :-) I would be especially interested about what you mean with "partly because of the output buffers". Best wishes, Ufuk

Re: Adding the streaming project to the main repository

2014-08-08 Thread Kostas Tzoumas
Wow! Incredible :-) Can you share more details about the experiments you ran (cluster setup, jobs, etc)? On Fri, Aug 8, 2014 at 3:53 PM, Gyula Fóra wrote: > Hey All, > > Quick weekely update on the streaming project: > > It was a good week we implemented a lot of new features and made > consi

Adding the streaming project to the main repository

2014-08-08 Thread Gyula Fóra
Hey All, Quick weekely update on the streaming project: It was a good week we implemented a lot of new features and made considerable work on the api too. Most notably: - Cluster performance was measured against Storm on both simple streaming wordcount and iterative algorithm (pagerank) and Flin

[jira] [Created] (FLINK-1043) Alternative combine interface

2014-08-08 Thread Sebastian Kruse (JIRA)
Sebastian Kruse created FLINK-1043: -- Summary: Alternative combine interface Key: FLINK-1043 URL: https://issues.apache.org/jira/browse/FLINK-1043 Project: Flink Issue Type: Wish

Re: [GitHub] incubator-flink pull request: [FLINK-909], [FLINK945] Remove addit...

2014-08-08 Thread Stephan Ewen
The memory problem was that the last superstep was sometimes not empty. The result was then written to the released back channel. When the iterative input came through a broadcast variable (the main input of the operator came from the cache), a step with an empty bc variable, but a full regular in

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Kostas Tzoumas
+1 to stick to the names without prefix. Most projects do it this way, and it makes names smaller, which is good. On Fri, Aug 8, 2014 at 10:39 AM, Ufuk Celebi wrote: > > On 08 Aug 2014, at 09:38, Ufuk Celebi wrote: > > >> There is one additional issue I would like to discuss here: With this >

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Ufuk Celebi
On 08 Aug 2014, at 09:38, Ufuk Celebi wrote: >> There is one additional issue I would like to discuss here: With this >> release candidate, I've named the artifacts >> "apache-flink-0.6-incubating-rc2-src.tgz", my first RC had just >> "flink-0.6-incubating..", without the "apache-" prefix. >> >

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Ufuk Celebi
On 08 Aug 2014, at 09:36, Robert Metzger wrote: > I think we should create another release candidate since Stephan provided a > fix for a crucial bug here: > https://github.com/apache/incubator-flink/pull/91. I'm going to test his > changes and see if it fixes the error I reported. I will also

Re: Guide/ tips to do preferred development for Flink?

2014-08-08 Thread Ufuk Celebi
Hey Henry, sorry for not getting back to you earlier. The difference is that the network stack does NOT start up and all data transfers are made locally via memory copies (no TCP channels). The default parallelism is set depending on the number of cores, so stuff will still run in parallel. I

Re: [VOTE] Release Apache Flink 0.6 (incubating) (RC2)

2014-08-08 Thread Robert Metzger
I think we should create another release candidate since Stephan provided a fix for a crucial bug here: https://github.com/apache/incubator-flink/pull/91. I'm going to test his changes and see if it fixes the error I reported. Also, we need to disable the POJO serialization, as it seems broken righ