About the mapreduce job to prepare the inputset, I did advocate for
this solution instead of supporting automatic creation of non-existent
vertices implicitly (which I believe adds a logical path in vertex
resolution which has some drawbacks e.g you have to check in the
hashmap for the existence of the destination vertex for each message,
which is "fine" now that it's a hashmap, but it's going to be less
fine when/if we turn to TreeMap for out-of-core).

Unfortunately the other committers preferred going for the path that
helps userland's life, so I guess this solution is not to be
considered here either.

On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter <s...@apache.org> wrote:
> On 29.05.2012 13:13, Paolo Castagna wrote:
>> Hi Sebastian
>>
>> Sebastian Schelter wrote:
>>> Why do you only recompute the pageRank in each second superstep? Can we
>>> not use the aggregated value of the dangling nodes from the last superstep?
>>
>> I removed the computing of PageRank values every each second superstep.
>> However, I needed to use a couple of aggregators for the dangling nodes
>> contribution instead of just one: "dangling-current" and "dangling-previous".
>>
>> Each superstep, I need to reset the dangling-current aggregator, at the
>> same time, I need to know the value of the aggregator at a previous
>> superstep.
>
> You can save the value from the previous step in a static variable in
> the WorkerContext before resetting the aggregator.
>
>>
>> I hope it makes sense, let me know if you have a better idea.
>>
>>> Overall I think we're on a good way to a robust, real-world PageRank
>>> implementation, I managed to implement the convergence check with an
>>> aggregator, will post an updated patch soon.
>>
>> I think I've just done it, have a look [1] and let me know if you would have
>> done it differently.
>>
>> Paolo
>>
>>  [1]
>> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
>>
>>
>



-- 
   Claudio Martella
   claudio.marte...@gmail.com

Reply via email to