Hi,

We read that "its benefit to cost ratio is very low" [1]. In our experience 
there is very little cost, so would the benefit be even lower? Running 
countless of link analysis iterations takes many hours but running the loops 
job with a depth of two takes much less time.

It may be computationally expensive but the iterations of link analysis (plus 
writing back to the whole CrawlDB) consume a _lot_ more I/O time. Can anyone 
(Dennis?) provide some more details and explain why it's discouraged in 
production systems with billions of link nodes?

[1]: http://wiki.apache.org/nutch/NewScoring#Loops

Thanks

Reply via email to