Hi,

Thanks for that but Looks like it is already available
https://github.com/vasia/gelly-streaming in streaming but I wonder why this
is not part of Flink? there are no releases either.

Thanks!

On Tue, Feb 18, 2020 at 9:13 AM Yun Gao <yungao...@aliyun.com> wrote:

>        Hi Kant,
>
>               As far as I know, I think the current example connected
> components implementation based on DataSet API could not be extended to
> streaming data or incremental batch directly.
>
>               From the algorithm's perspective, if the graph only add edge
> and never remove edge, I think the connected components should be able to
> be updated incrementally when the graph changes: When some edges are added,
> a new search should be started from the sources of the added edges to
> propagate its component ID. This will trigger a new pass of update of the
> following vertices, and the updates continues until no vertices' component
> ID get updated. However, if there are also edge removes, I think the
> incremental computation should not be easily achieved.
>
>               To implement the above logic on Flink, I think currently
> there should be two possible methods:
>                     1) Use DataSet API and DataSet iteration, maintains
> the graph structure and the latest computation result in a storage, and
> whenever there are enough changes to the graph, submits a new DataSet job
> to recompute the result. The job should load the edges, the latest
> component id and whether it is the source of the newly added edges for each
> graph vertex, and then start the above incremental computation logic.
>                     2) Flink also provide DataStream iteration API[1] that
> enables iterating on the unbounded data. In this case the graph
> modification should be modeled as a datastream, and some operators inside
> the iteration should maintain the graph structure and current component id.
> whenever there are enough changes, it starts a new pass of computation.
>
>     Best,
>      Yun
>
>         [1] Flink DataStream iteration,
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#iterations
>
> ------------------------------------------------------------------
> From:kant kodali <kanth...@gmail.com>
> Send Time:2020 Feb. 18 (Tue.) 15:11
> To:user <user@flink.apache.org>
> Subject:Can Connected Components run on a streaming dataset using iterate
> delta?
>
> Hi All,
>
> I am wondering if connected components
> <https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/examples.html#connected-components>
> can run on a streaming data? or say incremental batch?
>
> I see that with delta iteration not all vertices need to participate at
> every iteration which is great but in my case the graph is evolving over
> time other words new edges are getting added over time. If so, does the
> algorithm needs to run on the entire graph or can it simply run on the new
> batch of edges?
>
> Finally, What is the best way to compute connected components on Graphs
> evolving over time? Should I use streaming or batch or any custom
> incremental approach? Also, the documentation take maxIterations as an
> input. How can I come up with a good number for max iterations? and once I
> come up with a good number for max Iterations is the algorithm guaranteed
> to converge?
>
>
> Thanks,
> Kant
>
>
>

Reply via email to