date:20180523

Re: Efficient Stateful Processing with Time-Series Data and Enrichments

2018-05-23 Thread Mike Urbach

Hi Sihua, I will test keying by device ID. I was trying to implement this suggestion: https://stackoverflow.com/a/49395606, but I guess that may be unnecessary in my case. Thanks, Mike On Wed, May 23, 2018 at 11:30 PM, sihua zhou wrote: > Hi Mike, > if I'm not misunderstand, you are doing agg

cep performance

2018-05-23 Thread aitozi

Hi, guys I am using cep librayies these days, and i find that every time we receive an element from upstream, we will query the rocksdb for the nfa state, if it is changed still need to update the rocksdb, which makes the performance of the cep is poor? cc @Kl0 -- Sent from: http://apache-flin

Re:Efficient Stateful Processing with Time-Series Data and Enrichments

2018-05-23 Thread sihua zhou

Hi Mike, if I'm not misunderstand, you are doing aggregation for every device on the stream. You mentioned that, you want to use the MapState to store the state for each device ID? this is a bit confusing to me, I think what you need maybe a ValueState. In flink, every keyed state(Value, MapStat

Efficient Stateful Processing with Time-Series Data and Enrichments

2018-05-23 Thread Mike Urbach

Hi, I have a two-part question related to processing and storing large amounts of time-series data. The first part is related to the preferred way to keep state on the time-series data in an efficient way, and the second part is about how to further enrich the processed data and feed it back into

Re: increasing parallelism increases the end2end latency in flink sql

2018-05-23 Thread Yan Zhou [FDS Science]

The BoundedOutOfOrdernessTimestampExtractor is assigned to datastream after kafka consumer. The graph is like: KafkaSource-> map2Pojo -> BoundedOutOfOrdernessTimestampExtractor -> Table -> .. From: Yan Zhou [FDS Science] Sent: Wednesday, May 23, 2018 3:2

Re: How to restore state from savepoint with flink SQL

2018-05-23 Thread Yan Zhou [FDS Science]

Thanks for the reply. If I only change query upstream and downstream operators, can I restore the query's state from a savepoint? It seems like the translated operators for a query have a auto-generated uid/hash, whose value depends on its location in the graph and its input/output. Best Ya

increasing parallelism increases the end2end latency in flink sql

2018-05-23 Thread Yan Zhou [FDS Science]

Hi, My application assigned timestamp to kafka event with BoundedOutOfOrdernessTimestampExtractor then converted them to a table. Finally flink SQL over-window aggregation is run against the table. When I double the parallelism of my flink application, the end2end latency is doubled. What c

Starting beam pipeline from savepoint

2018-05-23 Thread borisbolvig

Is it possible to start Beam pipelines from savepoints when running on Flink (v.1.4)? I am running flink in a remote environment, and executing the pipelines as .jars specifying the flink jobmanager via cmd line arguments. This is opposed to passing the .jar to `flink run`. In this way the jar fi

Re: Multiple hdfs

2018-05-23 Thread Rong Rong

+1 on viewfs, I was going to add that :-) To add to this, viewfs can be use as a federation layer for supporting multiple HDFS clusters for checkpoint/savepoint HA purpose as well. -- Rong On Wed, May 23, 2018 at 6:29 AM, Stephan Ewen wrote: > I think that Hadoop recommends to solve such setup

Re: Decrease initial source read speed

2018-05-23 Thread Andrei Shumanski

Hi Piotr, Thanks! I will try it. It is a bit ugly solution, but it may work :) > On 23 May 2018, at 16:11, Piotr Nowojski wrote: > > One more remark. Currently there is unwritten assumption in Flink, that time > to process records is proportional number of bytes. As you noted, this > brakes

Re: Decrease initial source read speed

2018-05-23 Thread Piotr Nowojski

One more remark. Currently there is unwritten assumption in Flink, that time to process records is proportional number of bytes. As you noted, this brakes in case of mixed workloads (especially with file paths sent as records). There is interesting workaround this problem though. You could use

Re: Decrease initial source read speed

2018-05-23 Thread Piotr Nowojski

Hi, Yes if you have mixed workload in your pipeline, it is matter of finding a right balance. Situation will be better in Flink 1.5, but the underlying issue will remain as well - in 1.5.0 there also will be no way to change network buffers configuration between stages of the single job. Curre

Re: Multiple hdfs

2018-05-23 Thread Stephan Ewen

I think that Hadoop recommends to solve such setups with a viewfs:// that spans both HDFS clusters and then the two different clusters look like different paths within on file system. Similar as mounting different file systems into one directory tree in unix. On Tue, May 22, 2018 at 4:41 PM, Kien

Re: Decrease initial source read speed

2018-05-23 Thread Andrei Shumanski

Hi Piotr, Thank you very much for your response. I will try the new feature of Flink 1.5 when it is released. But I am not sure minimising buffers sizes will work in all scenarios. If I understand correctly these settings are affecting the whole Flink instance. We might have a flow like this: S

Re: When is 1.5 coming out

2018-05-23 Thread Fabian Hueske

Hi Vishal, Release candidate 5 (RC5) has been published and the voting period ends later today. Unless we find a blocking issue, we can push the release out today. FYI, if you are interested in the release progress, you can subscribe to the dev mailing list (or just check out the archives at list

Re: Leader Retrieval Timeout with HA Job Manager

2018-05-23 Thread Till Rohrmann

Hi Jason, sorry for the late response. The logs look indeed strange because both JMs are granted leadership without the other getting its leadership revoked. What would be interesting is to take a look at the Znode under `/flink/flink-ha/leader//job_manager_lock` in

When is 1.5 coming out

2018-05-23 Thread Vishal Santoshi

It has been some time and would want to know when it is officially a release.

Re: Decrease initial source read speed

2018-05-23 Thread Piotr Nowojski

Hi, Yes, Flink 1.5.0 will come with better tools to handle this problem. Namely you will be able to limit the “in flight” data, by controlling the number of assigned credits per channel/input gate. Even without any configuring Flink 1.5.0 will out of the box buffer less data, thus mitigating th

Re: strange behavior with jobmanager.rpc.address on standalone HA cluster

2018-05-23 Thread Till Rohrmann

Alright, try to grab the logs if you see this problem reoccurring. I would be interested in understanding why this happens. Cheers, Till On Fri, May 18, 2018 at 9:45 PM, Derek VerLee wrote: > Till, > > Thanks for the response. Sorry for the delayed reply. > > The flink version is 1.3.2, in sta

Re: Fwd: Decrease initial source read speed

2018-05-23 Thread Fabian Hueske

Hi Andrei, With the current version of Flink, there is no general solution to this problem. The upcoming version 1.5.0 of Flink adds a feature called credit-based flow control which might help here. I'm adding @Piotr to this thread who knows more about the details of this new feature. Best, Fabi

Re: Order of events with chanined keyBy calls of same key

2018-05-23 Thread Fabian Hueske

Hi, I've posted an answer on SO. Best, Fabian 2018-05-22 18:11 GMT+02:00 Shimony, Shay : > Hi everyone, > > > > I have this question in StackOverflow, and would be happy if you could > answer. > > https://stackoverflow.com/questions/50340107/order-of- > events-with-chanined-keyby-calls-of-same-

Re: program args size for running jobs

2018-05-23 Thread Fabian Hueske

Hi Esteban, If you need the parameters to configure specific operators (instead of the over all flow), you could pass the parameters as a file using the distributed cache [1]. Note, the docs point to the DataSet (batch) API, but the feature works the same way for DataStream programs as well. Best

Re: How to restore state from savepoint with flink SQL

2018-05-23 Thread Fabian Hueske

Hi, At the moment, you can only restore a query from a savepoint if the query is not modified and the same Flink version is used. Since SQL queries are automatically translated into data flows, it is not transparent to the user, which operators will be created. We would need to expose an intermedi

Re: Limitations with Retract Streams on SQL

2018-05-23 Thread Fabian Hueske

Hi Gregory, Rong's analysis is correct. The UNION with duplicate elimination is translated into a UNION ALL and a subsequent grouping operator on all attributes without an aggregation function. Flink assumes that all grouping operators can produce retractions (updates) and window-grouped aggregate

General forum of the event processing ?

2018-05-23 Thread Esa Heikkinen

Hi This is little bit out of topic in this mail list, but do you know any good general forum or mail list for the (complex) event processing ? Best, Esa

Re: Efficient Stateful Processing with Time-Series Data and Enrichments

cep performance

Re:Efficient Stateful Processing with Time-Series Data and Enrichments

Efficient Stateful Processing with Time-Series Data and Enrichments

Re: increasing parallelism increases the end2end latency in flink sql

Re: How to restore state from savepoint with flink SQL

increasing parallelism increases the end2end latency in flink sql

Starting beam pipeline from savepoint

Re: Multiple hdfs

Re: Decrease initial source read speed

Re: Decrease initial source read speed

Re: Decrease initial source read speed

Re: Multiple hdfs

Re: Decrease initial source read speed

Re: When is 1.5 coming out

Re: Leader Retrieval Timeout with HA Job Manager

When is 1.5 coming out

Re: Decrease initial source read speed

Re: strange behavior with jobmanager.rpc.address on standalone HA cluster

Re: Fwd: Decrease initial source read speed

Re: Order of events with chanined keyBy calls of same key

Re: program args size for running jobs

Re: How to restore state from savepoint with flink SQL

Re: Limitations with Retract Streams on SQL

General forum of the event processing ?

25 matches

Site Navigation

Mail list logo

Footer information