Cassandra sink wrt Counters

2016-05-09 Thread milind parikh
Given FLINK 3311 & 3332, I am wondering it would be possible, without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink. However my question has to do

Re: writing tests for my program

2016-05-09 Thread Igor Berman
answering my own question: testing streaming environment should be done with StreamingProgramTestBase & TestStreamEnvironment which are present in test package of flink-streaming-java project so it's not directly available? Project owners, why not to move above two to flink-test-utils? Or I don't

Local Cluster have problem with connect to elasticsearch

2016-05-09 Thread rafal green
Dear Sir or Madam, Can you tell me why I have a problem with elasticsearch in local cluster? I analysed this example: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html My flink and elasticsearch config are default (only I change node.name to

writing tests for my program

2016-05-09 Thread Igor Berman
Any idea how to handle following(the message is clear, but I'm not sure what I need to do) I'm opening "generic" environment in my code (StreamExecutionEnvironment.getExecutionEnvironment()) and JavaProgramTestBase configures TestEnvironment... so what I should do to support custom tests?

Re: Using ML lib SVM with Java

2016-05-09 Thread Theodore Vasiloudis
Hello Malte, As Simone said there is no Java support currently for FlinkML unfortunately. Regards, Theodore On Mon, May 9, 2016 at 3:05 PM, Simone Robutti wrote: > To my knowledge FlinkML does not support an unified API and most things > must be used exclusively

Run jar job in local cluster

2016-05-09 Thread rafal green
Dear Sir or Madam, Can you tell me why I have a problem with elasticsearch in local cluster? I analysed this example: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html My flink and elasticsearch config are default (only I change node.name to

Re: Streaming job software update

2016-05-09 Thread Aljoscha Krettek
Hi Bruce, you're right, taking down the job and restarting (from a savepoint) with the updated software is the only way of doing it. I'm not aware of any work being done in this area right now but it is an important topic that we certainly have to tackle in the not-so-far future. Cheer, Aljoscha

Re: Sorted output

2016-05-09 Thread Piyush Shrivastava
This is now solved, thank you. :-)  Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Monday, 9 May 2016 3:47 PM, Piyush Shrivastava wrote: Hello all, I have a time series based logic written with Flink. Due to the parallelism, I am not getting

Re: Using ML lib SVM with Java

2016-05-09 Thread Simone Robutti
To my knowledge FlinkML does not support an unified API and most things must be used exclusively with Scala Datasets. 2016-05-09 13:31 GMT+02:00 Malte Schwarzer : > Hi folks, > > I tried to get the FlinkML SVM running - but it didn't really work. The > SVM.fit() method

Re: Creating a custom operator

2016-05-09 Thread Simone Robutti
>- You wrote you'd like to "instantiate a H2O's node in every task manager". This reads a bit like you want to start H2O in the TM's JVM , but I would assume that a H2O node runs as a separate process. So should it be started inside the TM JVM or as an external process next to each TM. Also, do

Re: Creating a custom operator

2016-05-09 Thread Fabian Hueske
Hi Simone, sorry for the delayed answer. I have a few questions regarding your requirements and a some ideas that might be helpful (depending on the requirements). 1) Starting / stopping of H2O nodes from Flink - You wrote you'd like to "instantiate a H2O's node in every task manager". This

Sorted output

2016-05-09 Thread Piyush Shrivastava
Hello all, I have a time series based logic written with Flink. Due to the parallelism, I am not getting the output in a proper series.For example, 3> (12:00:00, "value") appears before 1> (11:59:00, "value") while the timestamp of the latter is smaller than the former. I am using TimeWindow and

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Perfect On Mon, May 9, 2016 at 3:12 PM, Ufuk Celebi wrote: > Yes, I did just that and I used the relevant Flink terminology instead > of #cores and #machines: > > #cores => #slots per TM > #machines => #TMs > > On Mon, May 9, 2016 at 11:33 AM, Punit Naik

Re: Writing Intermediates to disk

2016-05-09 Thread Vikram Saxena
I do not know if I understand completely, but I would create a new DataSet based on filtering the condition and then persist this DataSet. So : DataSet ds2 = DataSet1.filter(Condition) 2ds.output(...) On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi wrote: > Flink has

Re: Regarding source Parallelism and usage of coflatmap transformation

2016-05-09 Thread Aljoscha Krettek
Hi, regarding 1) the source needs to implement the ParallelSourceFunction or RichParallelSourceFunction interface to allow it to have a higher parallelism than 1. regarding 2) I wrote a small example that showcases how to do it: final StreamExecutionEnvironment env =

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Ufuk Celebi
Yes, I did just that and I used the relevant Flink terminology instead of #cores and #machines: #cores => #slots per TM #machines => #TMs On Mon, May 9, 2016 at 11:33 AM, Punit Naik wrote: > Yeah, thanks a lot for that. Also if you could, please write the formula, >

Re: Diff between stop and cancel job

2016-05-09 Thread Ufuk Celebi
On Thu, May 5, 2016 at 1:59 AM, Bajaj, Abhinav wrote: > Or can we resume a stopped streaming job ? You can use savepoints [1] to take a snapshot of a streaming program from which you can restart the job at a later point in time. This is independent of whether you cancel

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Yeah, thanks a lot for that. Also if you could, please write the formula, *#cores\^2\^* * *#machines* * 4, in a different form so that its more readable and understandable. On 09-May-2016 2:54 PM, "Ufuk Celebi" wrote: > On Mon, May 9, 2016 at 11:05 AM, Punit Naik

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Ufuk Celebi
On Mon, May 9, 2016 at 11:05 AM, Punit Naik wrote: > Thanks for the detailed answer. I will definitely try this and get back to > you. OK, looking forward to it. ;) In the meantime I've updated the docs with a more concise version of what do to when you see this

Re: How to measure Flink performance

2016-05-09 Thread Ufuk Celebi
Hey Prateek, On Fri, May 6, 2016 at 6:40 PM, prateekarora wrote: > I have below information from spark . do i can get similar information from > Flink also ? if yes then how can i get that. You can get GC time via the task manager overview. The other metrics don't

Re: Writing Intermediates to disk

2016-05-09 Thread Ufuk Celebi
Flink has support for spillable intermediate results. Currently they are only set if necessary to avoid pipeline deadlocks. You can force this via env.getConfig().setExecutionMode(ExecutionMode.BATCH); This will write shuffles to disk, but you don't get the fine-grained control you probably

Key factors for Flink's performance

2016-05-09 Thread leon_mclare
Hello Flink team, i am currently playing around with Storm and Flink in the context of a smart home. The primary functional requirement is to quickly react to certain properties in stream tuples. I was looking at some benchmarks from the two systems, and generally Flink has the upper hand, in

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Hi Ufuk Thanks for the detailed answer. I will definitely try this and get back to you. On 09-May-2016 2:08 PM, "Ufuk Celebi" wrote: > Hey Punit, > > you need to give the task managers more network buffers as Robert > suggested. Using the formula from the docs, can you please

Re: assigning stream element to multiple windows of different types

2016-05-09 Thread Aljoscha Krettek
Hi, I think it should (theoretically) work. You would have to provide a custom serializer that can serialize/deserialize your different window subclasses. Also, you will probably have to provide a Trigger that can deal with the different types of windows. Cheers, Aljoscha On Mon, 9 May 2016 at

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Ufuk Celebi
Hey Punit, you need to give the task managers more network buffers as Robert suggested. Using the formula from the docs, can you please use 147456 (96^2*4*4) for the number of network buffers. Each buffer is 32 KB, meaning that you give 4,5 GB of memory to the network stack. You might have to

Re: reading from latest kafka offset when flink starts

2016-05-09 Thread Ufuk Celebi
Robert, what do you think about adding a note about this to the Kafka consumer docs? This has come up a couple of times on the mailing list already. – Ufuk On Fri, May 6, 2016 at 12:07 PM, Balaji Rajagopalan wrote: > Thanks Robert appreciate your help. > > On

Re: Restart Flink in Yarn

2016-05-09 Thread Ufuk Celebi
Hey Dominique! Are you running the job in HA mode? – Ufuk On Thu, May 5, 2016 at 1:49 PM, Robert Metzger wrote: > Hi Dominic, > I'm sorry that you ran into this issue. > What do you mean by "flink streaming routes" ? > > Regarding the second question: "Now I want to