Re: [DISCUSS] Canceling Streaming Jobs

2015-05-26 Thread Gyula Fóra
Hey, I would also strongly prefer the second option, users need to have the option to force cancel a program in case of something unwanted behaviour. Cheers, Gyula Matthias J. Sax ezt írta (időpont: 2015. máj. 27., Sze, 1:20): > Hi, > > currently, the only way to stop a streaming job is to "ca

[DISCUSS] Canceling Streaming Jobs

2015-05-26 Thread Matthias J. Sax
Hi, currently, the only way to stop a streaming job is to "cancel" the job, This has multiple disadvantage: 1) a "clean" stopping is not possible (see https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop is a pre-requirement for FLINK-1929) and 2) as a minor issue, all cancel

Re: SQL on Flink

2015-05-26 Thread Ted Dunning
It would also be relatively simple (I think) to retarget drill to Flink if Flink doesn't provide enough typing meta-data to do traditional SQL. On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske wrote: > Hi, > > Flink's Table API is pretty close to what SQL provides. IMO, the best > approach woul

Re: SQL on Flink

2015-05-26 Thread Fabian Hueske
Hi, Flink's Table API is pretty close to what SQL provides. IMO, the best approach would be to leverage that and build a SQL parser (maybe together with a logical optimizer) on top of the Table API. Parser (and optimizer) could be built using Apache Calcite which is providing exactly this. Since

SQL on Flink

2015-05-26 Thread Timo Walther
Hey everyone, I would be interested in having a complete SQL API in Flink. How is the status there? Is someone already working on it? If not, I would like to work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf but I couldn't find anything on the mailing list or Jira. Otherwise

Re: [DISCUSS] Dedicated streaming mode

2015-05-26 Thread Henry Saputra
Ah yes, technically the streaming mode could run batch jobs as well in Flink. I am thinking that it could cause confusion with users since most systems that does batch and stream (well, pretty much Spark ^_^) does not differentiate the deployment topologies for the cluster to support different mode

Re: [DISCUSS] Dedicated streaming mode

2015-05-26 Thread Stephan Ewen
The streaming mode runs batch jobs as well :-) There should be slightly reduced predictability in the memory management in the streaming mode, but otherwise there should not be a problem. So if you want to run mixed workloads, you start the streaming mode. (Note: Currently, the batch mode runs

Re: [DISCUSS] Dedicated streaming mode

2015-05-26 Thread Henry Saputra
One immediate concern I have is the deployment topology. With streaming has its own cluster deployment, this means that in standalone mode, if ops would like to deploy Flink it has to know what mode it needs to deploy Flink as, either batch or Streaming. So, if the use case was to support both batc

[jira] [Created] (FLINK-2096) Remove implicit conversions in Streaming Scala API

2015-05-26 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2096: --- Summary: Remove implicit conversions in Streaming Scala API Key: FLINK-2096 URL: https://issues.apache.org/jira/browse/FLINK-2096 Project: Flink Issue

Re: Package multiple jobs in a single jar

2015-05-26 Thread Flavio Pompermaier
I agree with Matthias,I didn't know about ProgramDesciption and Program Interfaces because they are not advertised anywhere.. On Tue, May 26, 2015 at 5:01 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > I see your point. > > However, right now only few people are aware of "ProgramDe

Re: Package multiple jobs in a single jar

2015-05-26 Thread Matthias J. Sax
I see your point. However, right now only few people are aware of "ProgramDesciption" interface. If we want to "advertise" for it, it should be used (at least) in a few examples. Otherwise, people will never use it, and the changes I plan to apply are kind of useless. I would even claim, that the

[jira] [Created] (FLINK-2095) Screenshots missing in webcient documentation

2015-05-26 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2095: Summary: Screenshots missing in webcient documentation Key: FLINK-2095 URL: https://issues.apache.org/jira/browse/FLINK-2095 Project: Flink Issue Type: Bug

Re: Package multiple jobs in a single jar

2015-05-26 Thread Maximilian Michels
Sorry, my bad. Yes, it is helpful to have a separate program and parameter description in ProgramDescription. I'm not sure if it adds much value to implement ProgramDescription in the examples. It introduces verbosity and might give the impression that you have to implement ProgramDescription in yo

Re: Gelly Blog Post

2015-05-26 Thread Aljoscha Krettek
Very good, I made some comments and suggestions inline. On Tue, May 26, 2015 at 2:36 PM, Stephan Ewen wrote: > Wow, this is impressive :-) > > Amazing work, Gelly folks! > > On Tue, May 26, 2015 at 10:03 AM, Andra Lungu wrote: > >> Hey everyone, >> >> We are very excited to share the first stabl

Re: Gelly Blog Post

2015-05-26 Thread Stephan Ewen
Wow, this is impressive :-) Amazing work, Gelly folks! On Tue, May 26, 2015 at 10:03 AM, Andra Lungu wrote: > Hey everyone, > > We are very excited to share the first stable draft of the Gelly blog post > with you :D > > > https://docs.google.com/document/d/1FMtpwKSE3kY7RfH082LzQpWrY6o-fdZVxqam

[jira] [Created] (FLINK-2094) Implement Word2Vec

2015-05-26 Thread Nikolaas Steenbergen (JIRA)
Nikolaas Steenbergen created FLINK-2094: --- Summary: Implement Word2Vec Key: FLINK-2094 URL: https://issues.apache.org/jira/browse/FLINK-2094 Project: Flink Issue Type: Improvement

Re: Package multiple jobs in a single jar

2015-05-26 Thread Matthias J. Sax
Hi Max, thanks for your feedback. I guess you confuse the interfaces "Program" and "ProgramDescription". Using "Program" the use of main method is replaced by "getPlan(...)". However, "ProgramDescription" only adds method "getDescription()" which returns a string that explains the usage of the pro

Re: Package multiple jobs in a single jar

2015-05-26 Thread Maximilian Michels
I don't think `getDisplayName()` is necessary either. The class name and the description string should be fine. Adding ProgramDescription to the examples is not necessary; as already pointed out, using the main method is more convenient for most users. As far as I know, the idea of the ParameterToo

[jira] [Created] (FLINK-2093) Add a difference method to Gelly's Graph class

2015-05-26 Thread Andra Lungu (JIRA)
Andra Lungu created FLINK-2093: -- Summary: Add a difference method to Gelly's Graph class Key: FLINK-2093 URL: https://issues.apache.org/jira/browse/FLINK-2093 Project: Flink Issue Type: New Feat

Re: Changed the behavior of "DataSet.print()"

2015-05-26 Thread Robert Metzger
I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen wrote: > Hi all! > > Me merged a patch yesterday that changed the API behavior of the > "DataSet.print()" function. > > "print()" now prints to stdout on

[jira] [Created] (FLINK-2092) Document (new) behavior of print() and execute()

2015-05-26 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2092: - Summary: Document (new) behavior of print() and execute() Key: FLINK-2092 URL: https://issues.apache.org/jira/browse/FLINK-2092 Project: Flink Issue Type:

Re: [DISCUSS] Dedicated streaming mode

2015-05-26 Thread Maximilian Michels
+1 great changes coming up! I like the idea that, ultimately, Flink will handle streaming and batch programs equally well independently of the chosen cluster startup mode. What is the time frame for these changes? On Tue, May 26, 2015 at 7:34 AM, Henry Saputra wrote: > Thanks Aljoscha and Steph

Re: [DISCUSS] Behaviour of startNewChain() in Streaming

2015-05-26 Thread Maximilian Michels
I second Aljoscha's and Matthias' opinion on the behavior of `startNewChain()`. In the case of `setParallelism(..)`, we set the parallelism of the operator but in case of `startNewChain()`, we explicitly start a new chain; for the user, this is not connected to the previous operation even though th

Gelly Blog Post

2015-05-26 Thread Andra Lungu
Hey everyone, We are very excited to share the first stable draft of the Gelly blog post with you :D https://docs.google.com/document/d/1FMtpwKSE3kY7RfH082LzQpWrY6o-fdZVxqambIiC_rU/edit?usp=sharing *Feedback* is welcome, as usual! Andra

[jira] [Created] (FLINK-2091) Lock contention during release of network buffer pools

2015-05-26 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2091: -- Summary: Lock contention during release of network buffer pools Key: FLINK-2091 URL: https://issues.apache.org/jira/browse/FLINK-2091 Project: Flink Issue Type:

[jira] [Created] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2015-05-26 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2090: Summary: toString of CollectionInputFormat takes long time when the collection is huge Key: FLINK-2090 URL: https://issues.apache.org/jira/browse/FLINK-2090 Project:

[jira] [Created] (FLINK-2089) "Buffer recycled" IllegalStateException during cancelling

2015-05-26 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2089: -- Summary: "Buffer recycled" IllegalStateException during cancelling Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Typ