There are a number of parameters which limit the number of outlinks for a
page (IIRC 100 by default) but also the number of inlinks to consider when
inverting. Have a look at nutch-default.xml and try modifying the values in
nutch-site.xml

On 22 August 2011 20:31, abhayd <[email protected]> wrote:

> hi
> thx for response.
>
> i just ran
> bin/crawl urls -dir crawl -depth 20 -threads 10
>
> And used readdblink.
>
> My understanding from Nutch 1.3 tutorial is if i use bin/crawl ( and not
> step by step approach) i dont have  to do any other steps for indexing or
> reading crawl db.
>
> I am doing this
> -----------------------------------------------------
> 1. bin/crawl urls -dir crawl -depth 20 -threads 10
> 2.bin/nutch solrindex http://localhost:8080/solr/core3 crawl/crawldb
> crawl/linkdb crawl/segments/*
>
> Is that the correct approach?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3276112.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to