Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

Kaan Sancak Sat, 18 Apr 2020 13:51:21 -0700

Thanks that worked!

I wonder what will be the performance difference if I implement this with 
Stateful Functions. Does anyone knows recent works/papers on similar approach?


Best
Kaan


> On 16 Apr 2020, at 10:00, Yun Gao <yungao...@aliyun.com> wrote:
> 
> 
> Hi Kaan,
> 
>    For the first issue, I think the two implementation should have difference 
> and the first should be slower, but I think which one to use should be depend 
> on your algorithm if it could compute incrementally only with the changed 
> edges. However, as far as I know I think most graph algorithm does not 
> satisfy this property, therefore I think you might have to use the first one. 
> 
>     For the second issue, I think you might use Graph.getVertices() and 
> graph.getEdges() to get the underlying vertices and edges dataset of the 
> graph, then you could do any operations with the two datasets, like join the 
> vertices dataset with the second edge list, and finally create a new Graph 
> with new Graph(updated_vertices, edges, env).
> 
> Best,
>  Yun
> 
> ------------------------------------------------------------------
> From:Kaan Sancak <kaans...@gmail.com>
> Send Time:2020 Apr. 16 (Thu.) 17:17
> To:Till Rohrmann <trohrm...@apache.org>
> Cc:Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user <user@flink.apache.org>
> Subject:Re: Question about Writing Incremental Graph Algorithms using Apache 
> Flink Gelly
> 
> If the vertex type is POJO what happens during the union of the graph? Is 
> there a persistent approach, or can we define a function handle such 
> occasions?
> 
> Would there be a performance difference between two cases:
> 
> 1)
>       Graph graph = … // From edges list
> 
>       graph = graph.runScatterGatherIteration();
> 
>       Graph secondGraph = … // From second edge list
> 
>       graph = graph.union(secondGraph).runScatterGatherIteration()
> 
> 2)
>               
>       Graph graph = … // From edges list
> 
>       graph = graph.runScatterGatherIteration();
> 
>       graph.addEdges(second_edge_list)
> 
>       graph = graph.runScatterGatherIteration();
> 
> 
> Before starting the second scatter-gather, I want to set/reset some fields of 
> the vertex value of the vertices that are effected by edge 
> additions/deletions (or union). It would be good to have a callback function 
> that touches the end-points of the edges that are added/deleted.
> 
> Best
> Kaan
> 
> 
> 
> On Apr 15, 2020, at 11:07 AM, Till Rohrmann <trohrm...@apache.org> wrote:
> 
> Hi Kaan,
> 
> I think what you are proposing is something like this:
> 
> Graph<Long, Double, Double> graph = ... // get first batch
> 
> Graph<Long, Double, Double> graphAfterFirstSG = 
> graph.runScatterGatherIteration();
> 
> Graph<Long, Double, Double> secondBatch = ... // get second batch
> 
> // Adjust the result of SG iteration with secondBatch
> 
> Graph<Long, Double, Double> updatedGraph = 
> graphAfterFirstSG.union/difference(secondBatch));
> 
> updatedGraph.runScatterGatherIteration();
> 
> Then I believe this should work.
> 
> Cheers,
> Till
> 
> On Wed, Apr 15, 2020 at 1:14 AM Kaan Sancak <kaans...@gmail.com> wrote:
> Thanks for the useful information! It seems like a good and fun idea to 
> experiment. I will definitely give it a try.
> 
> I have a very close upcoming deadline and I have already implemented the 
> Scatter-Gather iteration algorithm.
> 
> I have another question on whether we can chain Scatter-Gather or 
> Vertex-Centric iterations.
> Let’s say that we have an initial batch/dataset, we run a Scatter-Gather and 
> obtain graph.
> Using another batch we added/deleted vertices to the graph we obtained. 
> Now we run another Scatter-Gather on the modified graph.
> 
> This is no streaming but a naive way to simulate batch updates that are 
> happening concurrently.
> Do you think it is a feasible way to do this way? 
> 
> Best
> Kaan
> 
> On Apr 13, 2020, at 11:16 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
> 
> Hi,
> 
> As you mentioned, Gelly Graph's are backed by Flink DataSets, and therefore
> work primarily on static graphs. I don't think it'll be possible to
> implement incremental algorithms described in your SO question.
> 
> Have you tried looking at Stateful Functions, a recent new API added to
> Flink?
> It supports arbitrary messaging between functions, which may allow you to
> build what you have in mind.
> Take a look at Seth's an Igal's comments here [1], where there seems to be a
> similar incremental graph-processing use case for sessionization.
> 
> Cheers,
> Gordon
> 
> [1]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Complex-graph-based-sessionization-potential-use-for-stateful-functions-td34000.html#a34017
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
> 
>

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

Reply via email to