Re: YARN High Availability

2016-03-31 Thread Konstantin Knauf
Hi everyone, we are running in some problems with multiple per-job yarn sessions, too. When we are are starting a per-job yarn session (Flink 1.0, Hadoop 2.4) with recovery.zookeeper.path.root other than /flink, the yarn session starts but no job is submitted, and after 1 min or so the session cr

wait until BulkIteration finishes

2016-03-31 Thread Lydia Ickler
Hi all, is there a way to tell the program that it should wait until the BulkIteration finishes before the rest of the program is executed? Best regards, Lydia

Re: withBroadcastSet for a DataStream missing?

2016-03-31 Thread Till Rohrmann
Hi Stavros, you might be able to solve your problem using a CoFlatMap operation with iterations. You would use one of the inputs for the iteration on which you broadcast the model updates to every operator. On the other input you would receive the data points which you want to cluster. As output y

Re: Upserts with Flink-elasticsearch

2016-03-31 Thread Aljoscha Krettek
@suneel if it is possible to add upsert support for ES 1.x I would go for it, yes. On Tue, 29 Mar 2016 at 16:36 Zach Cox wrote: > You can just create a new UpdateRequest instance directly using its > constructor [1] like this: > > return new UpdateRequest() > .index(index

Re: ContinuousProcessingTimeTrigger does not fire

2016-03-31 Thread Aljoscha Krettek
Hi, yes, right now this is expected behavior. But I see that it can be a bit, well, unexpected. The continuous trigger is only set when new elements arrive, so only when you put new elements does the trigger fire again after five seconds. If you want it to truly continuously fire every five secon

Re: Implicit inference of TypeInformation for join keys

2016-03-31 Thread Aljoscha Krettek
I'm afraid there is no way around having that extra ".apply" because the Scala compiler will get confused with the additional implicit parameter. It's a bit ugly, though ... On Wed, 30 Mar 2016 at 18:34 Timur Fayruzov wrote: > Actually, there is an even easier solution (which I saw in your reply

Re: Upserts with Flink-elasticsearch

2016-03-31 Thread HungChang
Without indexRequest ES2 throws `document does not exit exception`. Based on https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-update.html#java-docs-update-api-upsert the upsert works althrough not sure it's the best way. return new UpdateRequest()

Re: wait until BulkIteration finishes

2016-03-31 Thread Till Rohrmann
Hi Lydia, all downstream operators which depend on the bulk iteration will wait implicitly until data from the iteration operator is available. Cheers, Till On Thu, Mar 31, 2016 at 9:39 AM, Lydia Ickler wrote: > Hi all, > > is there a way to tell the program that it should wait until the > Bul

Re: Why Scala Option is not a valid key?

2016-03-31 Thread Till Rohrmann
Actually I think that it’s not correct that the OptionType cannot be used as a key type. In fact it is similar to a composite type and should be usable as a key iff it’s element can be used as a key. Then we only have to provide an OptionTypeComparator which will compare the elements if they are se

Re: DataSetUtils zipWithIndex question

2016-03-31 Thread Till Rohrmann
Hi Tarandeep, the number of elements in each partition should stay constant. In fact the elements in each partition should not change. Cheers, Till On Wed, Mar 30, 2016 at 8:14 AM, Tarandeep Singh wrote: > Hi, > > I am looking at implementation of zipWithIndex in DataSetUtils- > > https://gith

Re: DataSetUtils zipWithIndex question

2016-03-31 Thread Flavio Pompermaier
Hi Till and Tarandeep, I'm also interested in better understanding my knowledge about the concept of a partition.. >From what I know a partition is the portion of data assigned by the job manager to each task manager..right? Then, each partition is divided again at the task manager to maximize the

Re: DataSetUtils zipWithIndex question

2016-03-31 Thread Till Rohrmann
A partition is the portion of data each task receives. Thus, the degree of parallelism of your program/task decides how many different partitions you have. Depending on the upstream operators (and which data is send to which task), the partitions will most likely differ in size. Cheers, Till On T

Re: ContinuousProcessingTimeTrigger does not fire

2016-03-31 Thread Hironori Ogibayashi
Aljoscha, Thanks for your response. I understood that trigger is only set when new elements arrive, but in my previous example, trigger fired at 20:51:40.002, then new element arrived at 20:51:41, 42, 43. So why next trigger did not set at 20:51:45? It looks like the following situation. - 20:51:

Re: ContinuousProcessingTimeTrigger does not fire

2016-03-31 Thread Aljoscha Krettek
Oh I see what you mean now. I think the problem is that onProcessingTime changes nextFireTimestamp without actually setting a Trigger, as you said. I think changing onProcessingTime to this should have the correct result: @Override public TriggerResult onProcessingTime(long time, W window, Trigge

Re: DataSetUtils zipWithIndex question

2016-03-31 Thread Flavio Pompermaier
Ok, thanks for the clarification Till! On Thu, Mar 31, 2016 at 2:14 PM, Till Rohrmann wrote: > A partition is the portion of data each task receives. Thus, the degree of > parallelism of your program/task decides how many different partitions you > have. Depending on the upstream operators (and

Re: wait until BulkIteration finishes

2016-03-31 Thread Lydia Ickler
Hi Till, thanks for your reply! Is there a way to store intermediate results of the bulk iteration to use then in the next iteration except the data set one sends already by default? Best regards, Lydia > Am 31.03.2016 um 12:01 schrieb Till Rohrmann : > > Hi Lydia, > > all downstream opera

Re: wait until BulkIteration finishes

2016-03-31 Thread Till Rohrmann
I think I don't completely understand your question. On Thu, Mar 31, 2016 at 4:40 PM, Lydia Ickler wrote: > Hi Till, > > thanks for your reply! > Is there a way to store intermediate results of the bulk iteration to use > then in the next iteration except the data set one sends already by defaul

scaling a flink streaming application on a single node

2016-03-31 Thread Shinhyung Yang
Dear flink users and developers, I am trying to test scaling a flink streaming application on a single node and here I summarize my configuration and preliminary result. It would be really helpful if you take some time and consult my settings. test application: flink-1.0.0/examples/streaming/Word

Re: ContinuousProcessingTimeTrigger does not fire

2016-03-31 Thread Hironori Ogibayashi
Aljoscha, Thank you. That change looks good. I will try. Regards, Hironori 2016-03-31 22:20 GMT+09:00 Aljoscha Krettek : > Oh I see what you mean now. I think the problem is that onProcessingTime > changes nextFireTimestamp without actually setting a Trigger, as you said. > > I think changing on

Re: scaling a flink streaming application on a single node

2016-03-31 Thread Stephan Ewen
Hi! How many machines are you using for this? The fact that you are giving 64 slots to each TaskManager means that a single TaskManager may end up executing all 64 pipelines. That would heavily overload that TaskManager and cause heavy degradation. If, for example, you use 16 machines, then give

Re: wait until BulkIteration finishes

2016-03-31 Thread Stephan Ewen
Hi Lydia! The same function instances (for example MapFunction objects) are used across all supersteps. If, for example, you store something in a HashMap inside some MapFunction, you can access that in the next iteration superstep. You can figure out when a superstep finished and when the next su

Re: TopologyBuilder throws java.lang.ExceptionInInitializerError

2016-03-31 Thread Sharma, Samiksha
Yes I tried cleaning the repo but that did not help either, please find attached pom.xml for your reference. Thanks Samiksha From: Till Rohrmann mailto:trohrm...@apache.org>> Reply-To: "user@flink.apache.org" mailto:user@flink.apache.org>> Date: Wednesday, March 2

Re: scaling a flink streaming application on a single node

2016-03-31 Thread Shinhyung Yang
Thank you for replying! I am trying to do this on a single machine in fact. Since it has 64 cores, it would be interesting to look at the performance in that regard. > How many machines are you using for this? > > The fact that you are giving 64 slots to each TaskManager means that a > single Tas

Re: Implicit inference of TypeInformation for join keys

2016-03-31 Thread Timur Fayruzov
I'm very content with an extra `apply`, it's much cleaner than any of my initial solutions. On Thu, Mar 31, 2016 at 2:18 AM, Aljoscha Krettek wrote: > I'm afraid there is no way around having that extra ".apply" because the > Scala compiler will get confused with the additional implicit paramete

Multiple operations on a WindowedStream

2016-03-31 Thread Kanak Biscuitwala
Hi, I would like to write something that does something like a word count, and then emits only the 10 highest counts for that window. Logically, I would want to do something like: stream.timeWindow(Time.of(1, TimeUnit.MINUTES), Time.of(5, TimeUnit.SECONDS)).sum(2).apply(getTopK(10)) However,

Re: Kafka Test Error

2016-03-31 Thread Zhun Shen
Hi there, I check my build.gradle file, I use 'org.apache.flink:flink-connector-kafka-0.9_2.10:1.0.0’, but I found that this lib is based on kaka-clients 0.9.0.1. I want to use Flink streaming to consume Kafka’s events in realtime, but I’m confused by Flink’s libs with different versions. Whic

Re: Kafka Test Error

2016-03-31 Thread Ashutosh Kumar
I am using flink 1.0.0 with kafka 0.9 . I works fine for me. I use following dependency. org.apache.flink flink-connector-kafka-0.9_2.10 1.0.0 provided Thanks Ashutosh On Fri, Apr 1, 2016 at 10:46 AM, Zhun Shen wrote: > Hi there, > > I check

Re: wait until BulkIteration finishes

2016-03-31 Thread Lydia Ickler
Hi, thanks for your reply! Could you please give me an example for the close() step? I can’t find an example online only for open(). There I can „save“ my new result? Best regards, Lydia > Am 31.03.2016 um 18:16 schrieb Stephan Ewen : > > Hi Lydia! > > The same function instances (for exampl