There are a number of parameters which limit the number of outlinks for a page (IIRC 100 by default) but also the number of inlinks to consider when inverting. Have a look at nutch-default.xml and try modifying the values in nutch-site.xml
On 22 August 2011 20:31, abhayd <[email protected]> wrote: > hi > thx for response. > > i just ran > bin/crawl urls -dir crawl -depth 20 -threads 10 > > And used readdblink. > > My understanding from Nutch 1.3 tutorial is if i use bin/crawl ( and not > step by step approach) i dont have to do any other steps for indexing or > reading crawl db. > > I am doing this > ----------------------------------------------------- > 1. bin/crawl urls -dir crawl -depth 20 -threads 10 > 2.bin/nutch solrindex http://localhost:8080/solr/core3 crawl/crawldb > crawl/linkdb crawl/segments/* > > Is that the correct approach? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3276112.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

