Hi Suijian,
Giraph has several PageRank implementations. I suggest that you use
org.apache.giraph.examples.PageRankComputation which will automatically
check convergence for you and correctly handle dangling vertices
(vertices without any outlinks).
It relies on org.apache.giraph.examples.LongDoubleNullTextInputFormat
which expects a very simple text file. The format is one line per vertex
with the id of the vertex followed by the ids of adjacent vertices:
src_vertex_id dest_vertex_id_1 dest_vertex_id_2 ...
See org.apache.giraph.examples.PageRankComputationTest for an example of
how to configure it.
It needs org.apache.giraph.examples.RandomWalkWorkerContext as worker
context and org.apache.giraph.examples.RandomWalkVertexMasterCompute as
master compute.
Best,
Sebastian
On 02/26/2014 09:09 PM, Suijian Zhou wrote:
Hi,
To load and compute the pagerank of the following graph format(common in
social network graphs):
Src_vertex_id_1 Dest_vertex_id_2 Dest_vertex_id_3 (v1->v2, v1->v3)
Src_vertex_id_2 Dest_vertex_id_4 Dest_vertex_id_5 Dest_vertex_id_6 (v2->v4,
v2->v5, v2->v6)
.....
Should I have to convert the above input format into the following so as to
be compatible with giraph?
[Src_vertex1_id_1, 1, [[Dest_vertex_id_2,0],[Dest_vertex_id_3,0]]]
[Src_vertex1_id_2, 1,
[[Dest_vertex_id_4,0],[Dest_vertex_id_5,0],[Dest_vertex_id_6,0]]]
......
I.e, to set initial vertex values to 1 and edge values to 0? Thanks!
Best Regards,
Suijian