val grouped = R.groupBy[VertexId](G).persist(StorageLeve.MEMORY_ONLY_SER) // or whatever persistence makes more sense for you ... while(true) { val res = grouped.flatMap(F) res.collect.foreach(func) if(criteria) break }
On Thu, Feb 26, 2015 at 10:56 AM, Vijayasarathy Kannan <kvi...@vt.edu> wrote: > Hi, > > I have the following use case. > > (1) I have an RDD of edges of a graph (say R). > (2) do a groupBy on R (by say source vertex) and call a function F on each > group. > (3) collect the results from Fs and do some computation > (4) repeat the above steps until some criteria is met > > In (2), the groups are always going to be the same (since R is grouped by > source vertex). > > Question: > Is R distributed every iteration (when in (2)) or is it distributed only > once when it is created? > > A sample code snippet is below. > > while(true) { > val res = R.groupBy[VertexId](G).flatMap(F) > res.collect.foreach(func) > if(criteria) > break > } > > Since the groups remain the same, what is the best way to go about > implementing the above logic? >