Re: [ml] Convergence Criterias

2015-07-07 Thread Till Rohrmann
I think Sachin wants to provide something similar to the LossFunction but for the convergence criterion. This would mean that the user can specify a convergence calculator, for example to the optimization framework, which is used from within a iterateWithTermination call. I think this is a good

[jira] [Created] (FLINK-2324) Rework partitioned state storage

2015-07-07 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2324: - Summary: Rework partitioned state storage Key: FLINK-2324 URL: https://issues.apache.org/jira/browse/FLINK-2324 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2323) Rename OperatorState methods to .value() and .update(..)

2015-07-07 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2323: - Summary: Rename OperatorState methods to .value() and .update(..) Key: FLINK-2323 URL: https://issues.apache.org/jira/browse/FLINK-2323 Project: Flink Issue Type:

Re: Flink on Wikipedia

2015-07-07 Thread Stephan Ewen
It was just a suggestion. @Matthias: You wrote the article, you decide. If you want to keep it, that's fine! On Tue, Jul 7, 2015 at 4:57 PM, Matthias J. Sax mj...@informatik.hu-berlin.de wrote: I agree with Kostas and don't see much danger that people get confused. Nevertheless, I will

Re: Read 727 gz files ()

2015-07-07 Thread Felix Neutatz
Yes, that's maybe the problem. The user max is set to 100.000 open files. 2015-07-06 15:55 GMT+02:00 Stephan Ewen se...@apache.org: 4 mio file handles should be enough ;-) Is that the system global max, or the user's max? If the user's max us lower, this may be the issue... On Mon, Jul 6,

Re: Flink on Wikipedia

2015-07-07 Thread Matthias J. Sax
Well. It is not my article. It is on Wikipedia. Anyone can (and should) improve it! On 07/07/2015 05:08 PM, Stephan Ewen wrote: It was just a suggestion. @Matthias: You wrote the article, you decide. If you want to keep it, that's fine! On Tue, Jul 7, 2015 at 4:57 PM, Matthias J. Sax

Building several models in parallel

2015-07-07 Thread Felix Neutatz
Hi, at the moment I have a dataset which looks like this: DataSet[model_ID, DataVector] data So what I want to do is group by the model_ID and build for each model_ID one regression model in pseudo code: data.groupBy(model_ID) -- MultipleLinearRegression().fit(data_grouped) Is there

Re: Redesigned Features page

2015-07-07 Thread Gyula Fóra
I think the content is pretty good, much better than before. But the page structure could be better (and this is very important in my opinion). Now it just looks like a long list of features without any ways to navigate between them. We should probably have something at the top that summarizes the

Re: Redesigned Features page

2015-07-07 Thread Stephan Ewen
I actually put quite some thought into the structure of the points. They reflect pretty much what I observed (meetups and talks) where people get excited and what they are missing. The structure follows the line of through of stream processor that also does batch very well. And then separate the

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Sorry Stephan I meant it slightly differently, I see your point: DataStream source = ... SingleInputOperator mapper = source.map(...) mapper.addInput() So the add input would be a method of the operator not the stream. Aljoscha Krettek aljos...@apache.org ezt írta (időpont: 2015. júl. 7., K,

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
@Kostas: This new API is I believe equivalent in expressivity with the current one. We can define nested loops now as well. And I also don't see nested loops much worse generally than simple loops. Gyula Fóra gyula.f...@gmail.com ezt írta (időpont: 2015. júl. 7., K, 16:14): Sorry Stephan I

Re: Flink on Wikipedia

2015-07-07 Thread Stephan Ewen
Thanks, Matthias, for starting this. It looks a bit like the article talks more about the Stratosphere project than Flink right now. I think we need to make a few things clear, to not confuse people: 1) Flink != Stratosphere. When looking at the Stratosphere Paper and when looking at Flink, you

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
@Aljoscha: Yes, thats basically my point as well. This is what happens now too but we give this mutable datastream a special name : IterativeDataStream This can be handled in very different ways through the api, the goal would be to make something easy to use. I am fine with what we have now

Re: Design documents for consolidated DataStream API

2015-07-07 Thread Aljoscha Krettek
Hi, I just noticed that we don't have anything about how iterations and timestamps/watermarks should interact. Cheers, Aljoscha On Mon, 6 Jul 2015 at 23:56 Stephan Ewen se...@apache.org wrote: Hi all! As many of you know, there are a ongoing efforts to consolidate the streaming API for the

[Gelly] Help with GSA compiler tests

2015-07-07 Thread Vasiliki Kalavri
Hello to my squirrels, I've started looking into FLINK-1943 https://issues.apache.org/jira/browse/FLINK-1943 and I need some help to understand what to test and how to do it properly. In the corresponding Spargel compiler test, the following functionality is checked: 1. sink: the ship strategy

Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Hey, Along with the suggested changes to the streaming API structure I think we should also rework the iteration api. Currently the iteration api tries to mimic the syntax of the batch API while the runtime behaviour is quite different. What we create instead of iterations is really just cyclic

Re: Building several models in parallel

2015-07-07 Thread Felix Schüler
Hi Felix! We had a similar usecase and I trained multiple models on partitions of my data with mapPartition and the model-parameters (weights) as broadcast variable. If I understood broadcast variables in Flink correctly, you should end up with one model on each TaskManager. Does that work?

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Okay, I am fine with this approach as well I see the advantages. Then we just need to find a suitable name for marking a FeedbackPoint :) Stephan Ewen se...@apache.org ezt írta (időpont: 2015. júl. 7., K, 16:28): In Aljoscha's approach, we would need a special mutable stream. We could do it

[jira] [Created] (FLINK-2326) Mutitenancy on Yarn

2015-07-07 Thread LINTE (JIRA)
LINTE created FLINK-2326: Summary: Mutitenancy on Yarn Key: FLINK-2326 URL: https://issues.apache.org/jira/browse/FLINK-2326 Project: Flink Issue Type: Improvement Components: YARN Client

Re: Flink on Wikipedia

2015-07-07 Thread Stephan Ewen
Okay, I wrote a lot there tl:dr = Let's make sure people understand that the Stratosphere paper does not describe Flink. On Tue, Jul 7, 2015 at 4:33 PM, Matthias J. Sax mj...@informatik.hu-berlin.de wrote: I can't follow. Stratosphere is only mentioned in the History part. Of course, we can

Re: Flink on Wikipedia

2015-07-07 Thread Kostas Tzoumas
I think it is clear to most people that the only official and (hopefully) up-to-date description of an Apache project is its Apache website, and any paper can get outdated. Perhaps we can change the link to a more up-to-date paper when we have one. I like the article, thanks Matthias! Kostas On

Re: Rework of streaming iteration API

2015-07-07 Thread Paris Carbone
Good points. If we want to structured loops on streaming we will need to inject iteration counters. The question is if we really need structured iterations on plain data streams. Window iterations are must-have on the other hand... Paris On 07 Jul 2015, at 16:43, Kostas Tzoumas

Re: Flink on Wikipedia

2015-07-07 Thread Matthias J. Sax
I agree with Kostas and don't see much danger that people get confused. Nevertheless, I will update the history section accordingly. On 07/07/2015 04:48 PM, Kostas Tzoumas wrote: I think it is clear to most people that the only official and (hopefully) up-to-date description of an Apache