Hi Kant, there has not been high demand yet and it's always a matter of priority for the scarce manpower. I'd probably get inspired by gelly and implement it on DataStream in your stead.
On Sat, Feb 22, 2020 at 11:23 AM kant kodali <kanth...@gmail.com> wrote: > Hi, > > Thanks for that but Looks like it is already available > https://github.com/vasia/gelly-streaming in streaming but I wonder why > this is not part of Flink? there are no releases either. > > Thanks! > > On Tue, Feb 18, 2020 at 9:13 AM Yun Gao <yungao...@aliyun.com> wrote: > >> Hi Kant, >> >> As far as I know, I think the current example connected >> components implementation based on DataSet API could not be extended to >> streaming data or incremental batch directly. >> >> From the algorithm's perspective, if the graph only add >> edge and never remove edge, I think the connected components should be able >> to be updated incrementally when the graph changes: When some edges are >> added, a new search should be started from the sources of the added edges >> to propagate its component ID. This will trigger a new pass of update of >> the following vertices, and the updates continues until no vertices' >> component ID get updated. However, if there are also edge removes, I think >> the incremental computation should not be easily achieved. >> >> To implement the above logic on Flink, I think currently >> there should be two possible methods: >> 1) Use DataSet API and DataSet iteration, maintains >> the graph structure and the latest computation result in a storage, and >> whenever there are enough changes to the graph, submits a new DataSet job >> to recompute the result. The job should load the edges, the latest >> component id and whether it is the source of the newly added edges for each >> graph vertex, and then start the above incremental computation logic. >> 2) Flink also provide DataStream iteration API[1] >> that enables iterating on the unbounded data. In this case the graph >> modification should be modeled as a datastream, and some operators inside >> the iteration should maintain the graph structure and current component id. >> whenever there are enough changes, it starts a new pass of computation. >> >> Best, >> Yun >> >> [1] Flink DataStream iteration, >> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#iterations >> >> ------------------------------------------------------------------ >> From:kant kodali <kanth...@gmail.com> >> Send Time:2020 Feb. 18 (Tue.) 15:11 >> To:user <user@flink.apache.org> >> Subject:Can Connected Components run on a streaming dataset using iterate >> delta? >> >> Hi All, >> >> I am wondering if connected components >> <https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/examples.html#connected-components> >> can run on a streaming data? or say incremental batch? >> >> I see that with delta iteration not all vertices need to participate at >> every iteration which is great but in my case the graph is evolving over >> time other words new edges are getting added over time. If so, does the >> algorithm needs to run on the entire graph or can it simply run on the new >> batch of edges? >> >> Finally, What is the best way to compute connected components on Graphs >> evolving over time? Should I use streaming or batch or any custom >> incremental approach? Also, the documentation take maxIterations as an >> input. How can I come up with a good number for max iterations? and once I >> come up with a good number for max Iterations is the algorithm guaranteed >> to converge? >> >> >> Thanks, >> Kant >> >> >>