I think the code can be improved partially. You don't have to count the
vertices via an aggregator, you can simply use getNumVertices().

Why do you only recompute the pageRank in each second superstep? Can we
not use the aggregated value of the dangling nodes from the last superstep?

Overall I think we're on a good way to a robust, real-world PageRank
implementation, I managed to implement the convergence check with an
aggregator, will post an updated patch soon.

Best,
Sebastian


On 28.05.2012 18:39, Paolo Castagna wrote:
> Paolo Castagna wrote:
>> Sebastian Schelter wrote:
>>> I guess that summing up and redistributing the pagerank of dangling
>>> vertices can also be done without an extra superstep in an aggregator.
>>
>> Yeah! Why didn't I think about that?
>> Thanks, great suggestion.
>>
>> I am going to give this a go, at first without extending RandomWalkVertex 
>> since I want to see how it might work.
>> This would also inform design and improvements of the current 
>> RandomWalkVertex.
> 
> You can find a 'proper' ;-) implementation of PageRank with dangling nodes
> support, sum at the end of all PageRank values equals to 1.00 (since it's a
> probability distribution) and PageRank values validated against a third party
> implementation (i.e. JUNG), here [1].
> 
> I have not managed to do it as Sebastian suggested yet.
> 
> Paolo
> 
>  [1]
> https://github.com/castagna/jena-grande/blob/3c2d9f85bacb737acd575e3e287dc0fcc6bd96b9/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java
> 

Reply via email to