WebGraph: loops job expensive?

Markus Jelsma Tue, 24 Jan 2012 15:49:02 -0800

Hi,

We read that "its benefit to cost ratio is very low" [1]. In our experience 
there is very little cost, so would the benefit be even lower? Running 
countless of link analysis iterations takes many hours but running the loops 
job with a depth of two takes much less time.


It may be computationally expensive but the iterations of link analysis (plus 
writing back to the whole CrawlDB) consume a _lot_ more I/O time. Can anyone 
(Dennis?) provide some more details and explain why it's discouraged in 
production systems with billions of link nodes?

[1]: http://wiki.apache.org/nutch/NewScoring#Loops

Thanks

WebGraph: loops job expensive?

Reply via email to