[jira] [Created] (FLINK-2322) Unclosed stream may leak resource

2015-07-06 Thread Ted Yu (JIRA)
Ted Yu created FLINK-2322: - Summary: Unclosed stream may leak resource Key: FLINK-2322 URL: https://issues.apache.org/jira/browse/FLINK-2322 Project: Flink Issue Type: Bug Reporter: Ted Y

How do network transmissions in Flink work?

2015-07-06 Thread Niklas Semmler
Hello Flink Community, I am working on a network scheduler and am currently reading Flink's code to figure out how the data exchange works. It would be great if you could help me with some of my issues and questions. Basically I want to extract from flink the time when a data transmission be

Design documents for consolidated DataStream API

2015-07-06 Thread Stephan Ewen
Hi all! As many of you know, there are a ongoing efforts to consolidate the streaming API for the next release, and then graduate it (from beta status). In the process of this consolidation, we want to achieve the following goals. - Make the code more robust and simplify it in parts - Clearly

Flink on Wikipedia

2015-07-06 Thread Matthias J. Sax
Hi squirrels, I am happy to announce Flink on Wikipedia: https://en.wikipedia.org/wiki/Apache_Flink The Logo Request is still pending, but should be online soon. -Matthias signature.asc Description: OpenPGP digital signature

Re: [ml] Convergence Criterias

2015-07-06 Thread Theodore Vasiloudis
> > The point is to provide user with the solution before an iteration and > Am I correct to assume that by "user" you mean library developers here? Regular users who just use the API are unlikely to write their own convergence criterion function, yes? They would just set a value, for example the

Re: Redesigned "Features" page

2015-07-06 Thread Stephan Ewen
Thanks Max! Did not even know we had a github mirror of the flink-web repo... On Mon, Jul 6, 2015 at 6:05 PM, Maximilian Michels wrote: > Hi Stephan, > > Thanks for the feature page update. I think it is much more informative and > better structured now. > > By the way, you could also open a pu

Re: Redesigned "Features" page

2015-07-06 Thread Maximilian Michels
Hi Stephan, Thanks for the feature page update. I think it is much more informative and better structured now. By the way, you could also open a pull request for your changes on https://github.com/apache/flink-web/pulls Cheers, Max On Mon, Jul 6, 2015 at 3:28 PM, Fabian Hueske wrote: > I'll

Re: [ml] Convergence Criterias

2015-07-06 Thread Sachin Goel
Sure. Usually, the convergence criterion can be user defined. For example, for a linear regression problem, user might want to run the training until the relative change in squared error falls below a specific threshold, or the weights fail to shift by a relative or absolute percentage. Similarly,

Re: Read 727 gz files ()

2015-07-06 Thread Stephan Ewen
4 mio file handles should be enough ;-) Is that the system global max, or the user's max? If the user's max us lower, this may be the issue... On Mon, Jul 6, 2015 at 3:50 PM, Felix Neutatz wrote: > So do you know how to solve this issue apart from increasing the current > file-max (4748198)? >

Re: Read 727 gz files ()

2015-07-06 Thread Felix Neutatz
So do you know how to solve this issue apart from increasing the current file-max (4748198)? 2015-07-06 15:35 GMT+02:00 Stephan Ewen : > I think the error is pretty much exactly in the stack trace: > > Caused by: java.io.FileNotFoundException: > /data/4/hadoop/tmp/flink-io-0e2460bf-964b-4883-8eee

[jira] [Created] (FLINK-2321) The seed for the SVM classifier is currently static

2015-07-06 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2321: -- Summary: The seed for the SVM classifier is currently static Key: FLINK-2321 URL: https://issues.apache.org/jira/browse/FLINK-2321 Project: Flink

Re: Read 727 gz files ()

2015-07-06 Thread Stephan Ewen
I think the error is pretty much exactly in the stack trace: Caused by: java.io.FileNotFoundException: /data/4/hadoop/tmp/flink-io-0e2460bf-964b-4883-8eee-12869b9476ab/ 995a38a2c92536383d0057e3482999a9.000329.channel (Too many open files in system) On Mon, Jul 6, 2015 at 3:31 PM, Felix Neutatz

Read 727 gz files ()

2015-07-06 Thread Felix Neutatz
Hi, I want to do some simple aggregations on 727 gz files (68 GB total) from HDFS. See code here: https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/Stats.scala We are using a Flink-0.9 SNAPSHOT. I get the following error: Ca

Re: Hadoopinputformat for numpy und .mat matrices

2015-07-06 Thread Stephan Ewen
Hi Felix! I am not aware that any of those exist specifically for Flink. But, as always, if there is a Hadoop variant, you can use it with Flink as well ;-) Stephan On Wed, Jul 1, 2015 at 5:05 PM, Felix Neutatz wrote: > Hi everybody, > > does anybody know whether there is an implementation o

Re: Redesigned "Features" page

2015-07-06 Thread Fabian Hueske
I'll be happy to help, eh draw ;-) 2015-07-06 15:22 GMT+02:00 Stephan Ewen : > Hi all! > > I think that the "Features" page of the website is a bit out of date. > > I made an effort to stub a new one. It is committed under "features_new.md > " > and not yet built as an HTML page. > > If you want

Redesigned "Features" page

2015-07-06 Thread Stephan Ewen
Hi all! I think that the "Features" page of the website is a bit out of date. I made an effort to stub a new one. It is committed under "features_new.md" and not yet built as an HTML page. If you want to take a look and help building this, pull the flink-web git repository and build the website

[jira] [Created] (FLINK-2320) Enable DataSet DataStream Joins

2015-07-06 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2320: -- Summary: Enable DataSet DataStream Joins Key: FLINK-2320 URL: https://issues.apache.org/jira/browse/FLINK-2320 Project: Flink Issue Type: New Feature

Re: Flink 0.9 built with Scala 2.11

2015-07-06 Thread Alexander Alexandrov
> Because we are using Scala in our runtime, all modules are Scala dependent module. If all modules will need the suffix after your PR is merged, why would you talk about pure/non-pure distinction in the documentation? This adds complexity and may cause confusion which at the moment can be spared.

Re: [ml] Convergence Criterias

2015-07-06 Thread Theodore Vasiloudis
Hello Sachin, could you share the motivation behind this? The iterateWithTermination function provides us with a means of checking for convergence during iterations, and checking for convergence depends highly on the algorithm being implemented. It could be the relative change in error, it could d