About the mapreduce job to prepare the inputset, I did advocate for this solution instead of supporting automatic creation of non-existent vertices implicitly (which I believe adds a logical path in vertex resolution which has some drawbacks e.g you have to check in the hashmap for the existence of the destination vertex for each message, which is "fine" now that it's a hashmap, but it's going to be less fine when/if we turn to TreeMap for out-of-core).
Unfortunately the other committers preferred going for the path that helps userland's life, so I guess this solution is not to be considered here either. On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter <s...@apache.org> wrote: > On 29.05.2012 13:13, Paolo Castagna wrote: >> Hi Sebastian >> >> Sebastian Schelter wrote: >>> Why do you only recompute the pageRank in each second superstep? Can we >>> not use the aggregated value of the dangling nodes from the last superstep? >> >> I removed the computing of PageRank values every each second superstep. >> However, I needed to use a couple of aggregators for the dangling nodes >> contribution instead of just one: "dangling-current" and "dangling-previous". >> >> Each superstep, I need to reset the dangling-current aggregator, at the >> same time, I need to know the value of the aggregator at a previous >> superstep. > > You can save the value from the previous step in a static variable in > the WorkerContext before resetting the aggregator. > >> >> I hope it makes sense, let me know if you have a better idea. >> >>> Overall I think we're on a good way to a robust, real-world PageRank >>> implementation, I managed to implement the convergence check with an >>> aggregator, will post an updated patch soon. >> >> I think I've just done it, have a look [1] and let me know if you would have >> done it differently. >> >> Paolo >> >> [1] >> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90 >> >> > -- Claudio Martella claudio.marte...@gmail.com