Also to add to what Markus has said. A true PageRank type calculation
would run until it converges. LinkRank being iterative, runs a given
number of loops, by default 10. This tends to converge for many, but
not all, link sets.
Dennis
On 10/14/2011 08:26 AM, Markus Jelsma wrote:
On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
the process of LinkRank is iterative and scores tend to converge after
iteration. However, from the the source I discover it seems that the
job always reads from the same input path and produce to the same
output path. For instance,
runCounter() reads intput from nodes and returns the number of nodes
runInitializer() reads from nodes and initializes inLinkScore
then iteration (default is 10)
runInverted() reads from nodes, where inLinkScore is initialized,
outlinks, and loops; then produces output to
linkrank-<random>/inverted
runAnalysis() reads from nodes (inLinkScore is inited), and inverted
path (in previous step); then produces output to
linkrank-<random>/nodes
The score for X and Y after the first iteration are (1 - damping) + (damping *
sum(inlinkScore)). Suppose X also links to Y, then sum(inlinkScore) for Y will
change as X has a new value after the first iteration.
This is convergence as the delta's between iterations will flatten out after
each iteration.
This seems to me with the same process to calculate the scores, the
result of LinkRank will always be the same at each iteration. So I
can't understand very well how scores would converge. What place would
be the key point to spot at? Or any doc that may explain this more
detail?
Thanks.