Re: Published test artifacts for flink streaming

2015-11-20 Thread Nick Dimiduk
Very interesting Alex! One other thing I find useful in building data flows is using "builder" functions that hide the details of wiring up specific plumbing on generic input parameters. For instance a void wireFoo(DataSource source, SinkFunction sink) { ... }. It would be great to have test tools

placement preferences for streaming jobs

2015-11-20 Thread Stefania Costache
Hi, I have started using Flink and I am wondering if it is possible to specify placement preferences for the streaming jobs. More precisely, if I run Flink in stand-alone mode on a cluster and I submit a streaming job to it, is there a way to ask for the job or for some of its tasks to run on s

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
Hi, most users don't have that choice, they have to use Flink on YARN. Both modes have their advantages and disadvantages, but the decision is up to you. You can use a little bit more of your memory using the standalone mode, but you'll have to install Flink manually on all machines. Regarding th

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
Hi Robert, In this case, if both the standalone and yarn modes will run jobs if they have resources, is it better to rely on which one? I would be interested in a feature like the dynamic resource allocation with a fair scheduler that Spark has implemented. If you guys will consider this feature

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
Hi Ovidiu, good choice on your research topic ;) I think doing some hands on experiments will help you to understand much better how Flink works and what you can do with it. If I got it right: > -with standalone (cluster) you can run multiple workloads if you have > enough resources, else the jo

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
Thank you, Robert! My research interest includes Flink (I am a PhD student, BigStorage EU project, Inria Rennes) so I am currently preparing some experiments in order to understand better how it works. If I got it right: -with standalone (cluster) you can run multiple workloads if you have enou

Re: Compiler Exception

2015-11-20 Thread Truong Duc Kien
Hi Jill, Thank you very much. Looking forward to trying the fix. Best, Kien On Fri, Nov 20, 2015 at 12:38 PM, Till Rohrmann wrote: > Hi Kien Truong, > > I found a solution to your problem. It's actually a bug in Flink's > optimizer. Thanks for spotting it :-) > > I've opened a pull request to

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
Hi, I'll fix the link in the YARN documentation. Thank you for reporting the issue. I'm not aware of any discussions or implementations related to the scheduling. From my experience working with users and also from the mailing list, I don't think that such features are very important. Since stream

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
Hi, The link to FAQ (https://ci.apache.org/projects/flink/flink-docs-release-0.10/faq.html ) is on the yarn setup 0.10 documentation page (https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/yarn_setup.html

Re: Compiler Exception

2015-11-20 Thread Till Rohrmann
Hi Kien Truong, I found a solution to your problem. It's actually a bug in Flink's optimizer. Thanks for spotting it :-) I've opened a pull request to fix it ( https://github.com/apache/flink/pull/1388). The fix will also be included in the upcoming `0.10.1` release. After the pull request has be

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
Hi Ovidiu, you can submit multiple programs to a running Flink cluster (or a YARN session). Flink does currently not have any queuing mechanism. The JobManager will reject a program if there are not enough free resources for it. If there are enough resources for multiple programs, they'll run conc

Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
Hi, I am currently interested in experimenting on Flink over Hadoop YARN. I am documenting from the documentation we have here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/yarn_setup.html

Re: finite subset of an infinite data stream

2015-11-20 Thread Aljoscha Krettek
Hi, I’m very sorry, yes you would need my custom branch: https://github.com/aljoscha/flink/commits/state-enhance Cheers, Aljoscha > On 20 Nov 2015, at 10:13, rss rss wrote: > > Hello Aljoscha, > > very thanks. I tried to build your example but have an obstacle with > org.apache.flink.runtim

Re: finite subset of an infinite data stream

2015-11-20 Thread rss rss
Hello Aljoscha, very thanks. I tried to build your example but have an obstacle with org.apache.flink.runtime.state.AbstractStateBackend class. Where to get it? I guess it stored in your local branch only. Would you please to send me patches for public branch or share the branch with me? Best r