Re: how to write dataset in a file?

2015-11-20 Thread Suneel Marthi
You can write to a single output file by setting parallelism == 1 So final ExecutionEnvironment env = ExecutionEnvironment. createLocalEnvironment().setParallelism(1); The reason u see multiple output files is because, each worker is writing to a different file. On Fri, Nov 20, 2015 at 10:06 PM

how to write dataset in a file?

2015-11-20 Thread jun aoki
Hi Flink community I know I'm mistaken but could not find what I want. final ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(); DataSet data = env.readTextFile("file:///text1.txt"); FilterFunction filter = new MyFilterFunction(); // looks for a line starts with "[ERROR]" D

Re: Storm Compatibility

2015-11-20 Thread Matthias J. Sax
Multiple inputs per bolt is currently not supported. :( FlinkTopologyBuilder has a bug. There is already a JIRA for it: https://issues.apache.org/jira/browse/FLINK-2837 I know already how to fix it (hope to can get it into 0.10.1) Removing FlinkTopologyBuilder does make sense (I did not do it bec

Re: Re: Storm Compatibility

2015-11-20 Thread Maximilian Michels
I thought about the API changes again. It probably does make sense to keep the LocalCluster and StormSubmitter equivalent classes. That way, we don't break the Storm API too much. Users can stick to the pattern of using either FlinkCluster to execute locally or FlinkSubmitter to submit remotely. St

Null Pointer Exception in tests but only in COLLECTION mode

2015-11-20 Thread André Petermann
Hi all, during a workflow, a data set may run empty, e.g., because of a join without matches. We're using FlinkTestBase and found out, that aggregate functions on empty data sets work fine in CLUSTER execution mode but cause a Null Pointer Exception at AggregateOperator$AggregatingUdf in COL

[jira] [Created] (FLINK-3055) ExecutionVertex has duplicate method getParallelSubtaskIndex and getSubTaskIndex

2015-11-20 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-3055: -- Summary: ExecutionVertex has duplicate method getParallelSubtaskIndex and getSubTaskIndex Key: FLINK-3055 URL: https://issues.apache.org/jira/browse/FLINK-3055 Project: F

Re: How to modify TestBaseUtils.TASK_MANAGER_MEMORY_SIZE

2015-11-20 Thread Till Rohrmann
Hi Andre, what you could do is to either add a new startCluster method which takes as a parameter the task manager memory size. Alternatively, and I would prefer this alternative, you could change the startCluster(Configuration, StreamingMode, boolean) method to always overwrite the configuration

[jira] [Created] (FLINK-3054) Remove R (return) type variable from SerializationSchema

2015-11-20 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-3054: - Summary: Remove R (return) type variable from SerializationSchema Key: FLINK-3054 URL: https://issues.apache.org/jira/browse/FLINK-3054 Project: Flink Issu

How to modify TestBaseUtils.TASK_MANAGER_MEMORY_SIZE

2015-11-20 Thread André Petermann
Hi all, in longer workflows one reaches "too few memory segments error" in Tests inheriting from TestBaseUtils. A solution to this problem is increasing TASK_MANAGER_MEMORY_SIZE in the cluster configuration. We are doing this is a custom @BeforeClass method replacing the logic of the TestBas

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Matthias J. Sax
Nice. I just would need to get some feedback about it -- I had to change something in a "hacky way"... Maybe there is a better solution for it... https://github.com/apache/flink/pull/1387 I there is no better idea about solving the naming issue, I would merge it into master (there is no 0.10.1 b

[jira] [Created] (FLINK-3053) SuccessAfterNetworkBuffersFailureITCase.testSuccessfulProgramAfterFailure

2015-11-20 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3053: -- Summary: SuccessAfterNetworkBuffersFailureITCase.testSuccessfulProgramAfterFailure Key: FLINK-3053 URL: https://issues.apache.org/jira/browse/FLINK-3053 Project:

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Till Rohrmann
If it' not API breaking, then it can be included imo. On Fri, Nov 20, 2015 at 1:44 PM, Matthias J. Sax wrote: > If we find more bugs later on, we could have a 0.10.2, too. > > +1 for quick bug fix release. > > Question: should bug fix releases contain fixes for core components > only? I would ha

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Matthias J. Sax
If we find more bugs later on, we could have a 0.10.2, too. +1 for quick bug fix release. Question: should bug fix releases contain fixes for core components only? I would have a fix for a bug in Storm compatibility -- not sure if it should be included or not -Matthias On 11/20/2015 12:35 PM, T

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Till Rohrmann
The optimizer bug (https://issues.apache.org/jira/browse/FLINK-3052) should be fixed with https://github.com/apache/flink/pull/1388. On Fri, Nov 20, 2015 at 11:37 AM, Gyula Fóra wrote: > Thanks guys, > > I understand your point and you are probably right, if this is a > lightweight process than

[jira] [Created] (FLINK-3052) Optimizer does not push properties out of bulk iterations

2015-11-20 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3052: Summary: Optimizer does not push properties out of bulk iterations Key: FLINK-3052 URL: https://issues.apache.org/jira/browse/FLINK-3052 Project: Flink Issue

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Gyula Fóra
Thanks guys, I understand your point and you are probably right, if this is a lightweight process than the earlier the better :) Gyula On Fri, Nov 20, 2015 at 11:34 AM Ufuk Celebi wrote: > Hey Gyula, > > I understand your point, but we already have some important fixes for > 0.10.1. It's fair t

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Ufuk Celebi
Hey Gyula, I understand your point, but we already have some important fixes for 0.10.1. It's fair to assume that we will find more issues in the future, but the bugfix releases have way less overhead than the major releases. I would still keep the ASAP schedule and would not wait longer (except f

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Stephan Ewen
>From my experience, frequent bugfix releases are highly appreciated by users. There are some pretty serious fixes people are waiting for, and we can certainly do a 0.10.2 in a bit, if people find more issues. On Fri, Nov 20, 2015 at 11:25 AM, Gyula Fóra wrote: > Hi all, > > Wouldnt you think

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Gyula Fóra
Hi all, Wouldnt you think that it would make sense to wait a week or so to find all the hot issues with the current release? To me it feels a little bit like rushing this out and we will have almost the same situation afterwards. I might be wrong but I think people should get a chance to try thi

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Till Rohrmann
Actually, I still have another bug related to the optimizer which I would like to include if possible. The problem is that the optimizer is not able to push properties properly out of a bulk iteration which in some cases can lead to rejected Flink jobs. On Fri, Nov 20, 2015 at 11:10 AM, Robert Met

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Robert Metzger
Great, thank you! Let me know if there is any issue, I'll address it asap. The PR is not building anymore because you've pushed an update to the Kafka documentation. I can rebase and merge the PR once you give me green light ;) Till has merged FLINK-3021, so we might be able to have a first RC to

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Stephan Ewen
Let me look at FLINK-2974 (open PR) to see if it can be merged... On Thu, Nov 19, 2015 at 10:09 PM, Robert Metzger wrote: > Looks like we didn't manage to merge everything today. > > (pending PRs) > - FLINK-3021 Fix class loading issue for streaming sources > - FLINK-2974 Add periodic offset com

Apache Tinkerpop & Geode Integration?

2015-11-20 Thread James Thornton
Hi - This is James Thornton (espeed) from the Apache Tinkerpop project ( http://tinkerpop.incubator.apache.org/). The Flink iterators should pair well with Gremlin's Graph Traversal Machine ( http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine) -- it would be good