Hi Stephan,
An external project would be possible and maybe merge it in the future if
it makes sense. Just wanted to point out that in general there is a need,
but i understand priorities and may also try to work on these.
Best,
Stavros
On Thu, May 26, 2016 at 10:00 PM, Stephan Ewen wrote:
> H
Hi Stavros!
I think what Aljoscha wants to say is that the community is a bit hard
pressed reviewing new and complex things right now.
There are a lot of threads going on already.
If you want to work on this, why not make your own GitHub project
"Approximate algorithms on Apache Flink" or so?
Gr
Hi,
that link was interesting, thanks! As I said though, it's probably not a
good fit for Flink right now.
The things that I feel are important right now are:
- dynamic scaling: the ability of a streaming pipeline to adapt to changes
in the amount of incoming data. This is tricky with stateful o
Hey Aljoscha,
Thnax for the useful comments. I have recently looked at spark sketches:
http://www.slideshare.net/databricks/sketching-big-data-with-spark-randomized-algorithms-for-largescale-data-analytics
So there must be value in this effort.
In my experience counting in general is a common need
Hi,
no such changes are planned right now. The separaten between the keys is
very strict in order to make the windowing state re-partitionable so that
we can implement dynamic rescaling of the parallelism of a program.
The WindowAll is only used for specific cases where you need a Trigger that
see
Hi thnx for the feedback.
So there is a limitation due to parallel windows implementation.
No intentions to change that somehow to accommodate similar estimations?
WindowAll in practice is used as step in the pipeline? I mean since its
inherently not parallel cannot scale correct?
Although there
Hi,
with how the window API currently works this can only be done for
non-parallel windows. For keyed windows everything that happens is scoped
to the key of the elements: window contents are kept in per-key state,
triggers fire on a per-key basis. Therefore a count-min sketch cannot be
used becaus
Hi guys,
I would like to push forward the work here:
https://issues.apache.org/jira/browse/FLINK-2147
Can anyone more familiar with streaming api verify if this could be a
mature task.
The intention is to summarize data over a window like in the case of
StreamGroupedFold.
Specifically implement c