Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

Avery Ching Tue, 29 May 2012 13:22:00 -0700

We did have a related issue(https://issues.apache.org/jira/browse/GIRAPH-155).


On 5/29/12 6:54 AM, Claudio Martella wrote:

I'm not sure they will be needed to send them on the first superstep.
They'll be created and used in the second superstep if necessary. If
they need it in the first superstep, then i guess they'll put them as
a line in the inputfile.
I agree with you that this is kind of messed up :)



On Tue, May 29, 2012 at 3:23 PM, Sebastian Schelter<s...@apache.org>  wrote:

Oh sorry, I didn't know that discussion. The problem I see is that in
every implementation, a user might run into this issue, and I don't
think its ideal to force users to always run a round of sending empty
messages at the beginning.

Maybe the system should (somehow) automagically do that for the users?
Really seems to be an awkward situation though...

--sebastian



On 29.05.2012 15:03, Claudio Martella wrote:

About the mapreduce job to prepare the inputset, I did advocate for
this solution instead of supporting automatic creation of non-existent
vertices implicitly (which I believe adds a logical path in vertex
resolution which has some drawbacks e.g you have to check in the
hashmap for the existence of the destination vertex for each message,
which is "fine" now that it's a hashmap, but it's going to be less
fine when/if we turn to TreeMap for out-of-core).

Unfortunately the other committers preferred going for the path that
helps userland's life, so I guess this solution is not to be
considered here either.

On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter<s...@apache.org>  wrote:

On 29.05.2012 13:13, Paolo Castagna wrote:

Hi Sebastian

Sebastian Schelter wrote:

Why do you only recompute the pageRank in each second superstep? Can we
not use the aggregated value of the dangling nodes from the last superstep?

I removed the computing of PageRank values every each second superstep.
However, I needed to use a couple of aggregators for the dangling nodes
contribution instead of just one: "dangling-current" and "dangling-previous".

Each superstep, I need to reset the dangling-current aggregator, at the
same time, I need to know the value of the aggregator at a previous
superstep.

You can save the value from the previous step in a static variable in
the WorkerContext before resetting the aggregator.

I hope it makes sense, let me know if you have a better idea.

Overall I think we're on a good way to a robust, real-world PageRank
implementation, I managed to implement the convergence check with an
aggregator, will post an updated patch soon.

I think I've just done it, have a look [1] and let me know if you would have
done it differently.

Paolo

  [1]
https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90

Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

Reply via email to