Hi,
I have a two-part question related to processing and storing large amounts
of time-series data. The first part is related to the preferred way to keep
state on the time-series data in an efficient way, and the second part is
about how to further enrich the processed data and feed it back into
The BoundedOutOfOrdernessTimestampExtractor is assigned to datastream after
kafka consumer. The graph is like:
KafkaSource-> map2Pojo -> BoundedOutOfOrdernessTimestampExtractor -> Table ->
..
From: Yan Zhou [FDS Science]
Sent:
Hi,
My application assigned timestamp to kafka event with
BoundedOutOfOrdernessTimestampExtractor then converted them to a table. Finally
flink SQL over-window aggregation is run against the table.
When I double the parallelism of my flink application, the end2end latency is
doubled. What
Is it possible to start Beam pipelines from savepoints when running on Flink
(v.1.4)?
I am running flink in a remote environment, and executing the pipelines as
.jars specifying the flink jobmanager via cmd line arguments. This is
opposed to passing the .jar to `flink run`.
In this way the jar
+1 on viewfs, I was going to add that :-)
To add to this, viewfs can be use as a federation layer for supporting
multiple HDFS clusters for checkpoint/savepoint HA purpose as well.
--
Rong
On Wed, May 23, 2018 at 6:29 AM, Stephan Ewen wrote:
> I think that Hadoop recommends
Hi Piotr,
Thanks!
I will try it. It is a bit ugly solution, but it may work :)
> On 23 May 2018, at 16:11, Piotr Nowojski wrote:
>
> One more remark. Currently there is unwritten assumption in Flink, that time
> to process records is proportional number of bytes. As
One more remark. Currently there is unwritten assumption in Flink, that time to
process records is proportional number of bytes. As you noted, this brakes in
case of mixed workloads (especially with file paths sent as records).
There is interesting workaround this problem though. You could
Hi,
Yes if you have mixed workload in your pipeline, it is matter of finding a
right balance. Situation will be better in Flink 1.5, but the underlying issue
will remain as well - in 1.5.0 there also will be no way to change network
buffers configuration between stages of the single job.
I think that Hadoop recommends to solve such setups with a viewfs:// that
spans both HDFS clusters and then the two different clusters look like
different paths within on file system. Similar as mounting different file
systems into one directory tree in unix.
On Tue, May 22, 2018 at 4:41 PM, Kien
Hi Piotr,
Thank you very much for your response.
I will try the new feature of Flink 1.5 when it is released.
But I am not sure minimising buffers sizes will work in all scenarios.
If I understand correctly these settings are affecting the whole Flink instance.
We might have a flow like this:
Hi Vishal,
Release candidate 5 (RC5) has been published and the voting period ends
later today.
Unless we find a blocking issue, we can push the release out today.
FYI, if you are interested in the release progress, you can subscribe to
the dev mailing list (or just check out the archives at
Hi Jason,
sorry for the late response. The logs look indeed strange because both JMs
are granted leadership without the other getting its leadership revoked.
What would be interesting is to take a look at the Znode under
`/flink/flink-ha/leader//job_manager_lock`
It has been some time and would want to know when it is officially a
release.
Hi,
Yes, Flink 1.5.0 will come with better tools to handle this problem. Namely you
will be able to limit the “in flight” data, by controlling the number of
assigned credits per channel/input gate. Even without any configuring Flink
1.5.0 will out of the box buffer less data, thus mitigating
Alright, try to grab the logs if you see this problem reoccurring. I would
be interested in understanding why this happens.
Cheers,
Till
On Fri, May 18, 2018 at 9:45 PM, Derek VerLee wrote:
> Till,
>
> Thanks for the response. Sorry for the delayed reply.
>
> The flink
Hi Andrei,
With the current version of Flink, there is no general solution to this
problem.
The upcoming version 1.5.0 of Flink adds a feature called credit-based flow
control which might help here.
I'm adding @Piotr to this thread who knows more about the details of this
new feature.
Best,
Hi,
I've posted an answer on SO.
Best, Fabian
2018-05-22 18:11 GMT+02:00 Shimony, Shay :
> Hi everyone,
>
>
>
> I have this question in StackOverflow, and would be happy if you could
> answer.
>
> https://stackoverflow.com/questions/50340107/order-of-
>
Hi Esteban,
If you need the parameters to configure specific operators (instead of the
over all flow), you could pass the parameters as a file using the
distributed cache [1].
Note, the docs point to the DataSet (batch) API, but the feature works the
same way for DataStream programs as well.
Hi,
At the moment, you can only restore a query from a savepoint if the query
is not modified and the same Flink version is used.
Since SQL queries are automatically translated into data flows, it is not
transparent to the user, which operators will be created.
We would need to expose an
Hi Gregory,
Rong's analysis is correct. The UNION with duplicate elimination is
translated into a UNION ALL and a subsequent grouping operator on all
attributes without an aggregation function.
Flink assumes that all grouping operators can produce retractions (updates)
and window-grouped
Hi
This is little bit out of topic in this mail list, but do you know any good
general forum or mail list for the (complex) event processing ?
Best, Esa
21 matches
Mail list logo