Hi - i would not use LinkRank on small scale crawls, and neither for 
verticals, if internal links are ignored, there are few links to score, if 
not, the graph is too dense.

It is only useful - for me/us - to let the web decide what hosts and pages are 
popular, so that means large scale.

On Wednesday 10 September 2014 07:43:34 Lewis John Mcgibbney wrote:
> Hi Markus,
> 
> On Wed, Sep 10, 2014 at 2:00 AM, <user-digest-h...@nutch.apache.org> wrote:
> > Hey Lewis,
> > 
> > We didn't use it in the end, but did run the LinkRank on large amounts of
> > data. We then used the scores generated by it for biasing a deduplication
> > algorithm. We tested it thoroughly and never stumbled on issues that could
> > have been resolved using the Loops algorithm.
> > 
> > Thanks for reply Markus.
> 
> OK so here is the deal, we are currently exhausting vertical crawls on
> around 20-30 domains. We are not obtaining external links at the moment to
> domains outside of those target domains, so I've adjusted the <linkrank>
> properties in nutch-site.xml accordingly along with other related
> properties and config to restric the crawl as such.
> I am going to experiment with using both options in an attempt to move
> towards attacking this documentation and substantiating upon my own
> understanding.
> Thanks for your reply.
> Lewis

Reply via email to