Hi Sihua,
I will test keying by device ID. I was trying to implement this suggestion:
https://stackoverflow.com/a/49395606, but I guess that may be unnecessary
in my case.
Thanks,
Mike
On Wed, May 23, 2018 at 11:30 PM, sihua zhou wrote:
> Hi Mike,
> if I'm not misunderstand, you are doing agg
Hi, guys
I am using cep librayies these days, and i find that every time we receive
an element from upstream, we will query the rocksdb for the nfa state, if it
is changed still need to update the rocksdb, which makes the performance of
the cep is poor?
cc @Kl0
--
Sent from: http://apache-flin
Hi Mike,
if I'm not misunderstand, you are doing aggregation for every device on the
stream. You mentioned that, you want to use the MapState to store the state for
each device ID? this is a bit confusing to me, I think what you need maybe a
ValueState. In flink, every keyed state(Value, MapStat
Hi,
I have a two-part question related to processing and storing large amounts
of time-series data. The first part is related to the preferred way to keep
state on the time-series data in an efficient way, and the second part is
about how to further enrich the processed data and feed it back into
The BoundedOutOfOrdernessTimestampExtractor is assigned to datastream after
kafka consumer. The graph is like:
KafkaSource-> map2Pojo -> BoundedOutOfOrdernessTimestampExtractor -> Table ->
..
From: Yan Zhou [FDS Science]
Sent: Wednesday, May 23, 2018 3:2
Thanks for the reply.
If I only change query upstream and downstream operators, can I restore the
query's state from a savepoint? It seems like the translated operators for a
query have a auto-generated uid/hash, whose value depends on its location in
the graph and its input/output.
Best
Ya
Hi,
My application assigned timestamp to kafka event with
BoundedOutOfOrdernessTimestampExtractor then converted them to a table. Finally
flink SQL over-window aggregation is run against the table.
When I double the parallelism of my flink application, the end2end latency is
doubled. What c
Is it possible to start Beam pipelines from savepoints when running on Flink
(v.1.4)?
I am running flink in a remote environment, and executing the pipelines as
.jars specifying the flink jobmanager via cmd line arguments. This is
opposed to passing the .jar to `flink run`.
In this way the jar fi
+1 on viewfs, I was going to add that :-)
To add to this, viewfs can be use as a federation layer for supporting
multiple HDFS clusters for checkpoint/savepoint HA purpose as well.
--
Rong
On Wed, May 23, 2018 at 6:29 AM, Stephan Ewen wrote:
> I think that Hadoop recommends to solve such setup
Hi Piotr,
Thanks!
I will try it. It is a bit ugly solution, but it may work :)
> On 23 May 2018, at 16:11, Piotr Nowojski wrote:
>
> One more remark. Currently there is unwritten assumption in Flink, that time
> to process records is proportional number of bytes. As you noted, this
> brakes
One more remark. Currently there is unwritten assumption in Flink, that time to
process records is proportional number of bytes. As you noted, this brakes in
case of mixed workloads (especially with file paths sent as records).
There is interesting workaround this problem though. You could use
Hi,
Yes if you have mixed workload in your pipeline, it is matter of finding a
right balance. Situation will be better in Flink 1.5, but the underlying issue
will remain as well - in 1.5.0 there also will be no way to change network
buffers configuration between stages of the single job.
Curre
I think that Hadoop recommends to solve such setups with a viewfs:// that
spans both HDFS clusters and then the two different clusters look like
different paths within on file system. Similar as mounting different file
systems into one directory tree in unix.
On Tue, May 22, 2018 at 4:41 PM, Kien
Hi Piotr,
Thank you very much for your response.
I will try the new feature of Flink 1.5 when it is released.
But I am not sure minimising buffers sizes will work in all scenarios.
If I understand correctly these settings are affecting the whole Flink instance.
We might have a flow like this:
S
Hi Vishal,
Release candidate 5 (RC5) has been published and the voting period ends
later today.
Unless we find a blocking issue, we can push the release out today.
FYI, if you are interested in the release progress, you can subscribe to
the dev mailing list (or just check out the archives at list
Hi Jason,
sorry for the late response. The logs look indeed strange because both JMs
are granted leadership without the other getting its leadership revoked.
What would be interesting is to take a look at the Znode under
`/flink/flink-ha/leader//job_manager_lock`
in
It has been some time and would want to know when it is officially a
release.
Hi,
Yes, Flink 1.5.0 will come with better tools to handle this problem. Namely you
will be able to limit the “in flight” data, by controlling the number of
assigned credits per channel/input gate. Even without any configuring Flink
1.5.0 will out of the box buffer less data, thus mitigating th
Alright, try to grab the logs if you see this problem reoccurring. I would
be interested in understanding why this happens.
Cheers,
Till
On Fri, May 18, 2018 at 9:45 PM, Derek VerLee wrote:
> Till,
>
> Thanks for the response. Sorry for the delayed reply.
>
> The flink version is 1.3.2, in sta
Hi Andrei,
With the current version of Flink, there is no general solution to this
problem.
The upcoming version 1.5.0 of Flink adds a feature called credit-based flow
control which might help here.
I'm adding @Piotr to this thread who knows more about the details of this
new feature.
Best, Fabi
Hi,
I've posted an answer on SO.
Best, Fabian
2018-05-22 18:11 GMT+02:00 Shimony, Shay :
> Hi everyone,
>
>
>
> I have this question in StackOverflow, and would be happy if you could
> answer.
>
> https://stackoverflow.com/questions/50340107/order-of-
> events-with-chanined-keyby-calls-of-same-
Hi Esteban,
If you need the parameters to configure specific operators (instead of the
over all flow), you could pass the parameters as a file using the
distributed cache [1].
Note, the docs point to the DataSet (batch) API, but the feature works the
same way for DataStream programs as well.
Best
Hi,
At the moment, you can only restore a query from a savepoint if the query
is not modified and the same Flink version is used.
Since SQL queries are automatically translated into data flows, it is not
transparent to the user, which operators will be created.
We would need to expose an intermedi
Hi Gregory,
Rong's analysis is correct. The UNION with duplicate elimination is
translated into a UNION ALL and a subsequent grouping operator on all
attributes without an aggregation function.
Flink assumes that all grouping operators can produce retractions (updates)
and window-grouped aggregate
Hi
This is little bit out of topic in this mail list, but do you know any good
general forum or mail list for the (complex) event processing ?
Best, Esa
25 matches
Mail list logo