hi Thanks for helping me with this.
After crawling i checked the crawldb. links with status(1) code were 2400 links which got into solr index fine. thanks Date: Tue, 23 Aug 2011 01:15:36 -0700 From: [email protected] To: [email protected] Subject: Re: readdblink not showing alllinks There are a number of parameters which limit the number of outlinks for a page (IIRC 100 by default) but also the number of inlinks to consider when inverting. Have a look at nutch-default.xml and try modifying the values in nutch-site.xml On 22 August 2011 20:31, abhayd <[hidden email]> wrote: > hi > thx for response. > > i just ran > bin/crawl urls -dir crawl -depth 20 -threads 10 > > And used readdblink. > > My understanding from Nutch 1.3 tutorial is if i use bin/crawl ( and not > step by step approach) i dont have to do any other steps for indexing or > reading crawl db. > > I am doing this > ----------------------------------------------------- > 1. bin/crawl urls -dir crawl -depth 20 -threads 10 > 2.bin/nutch solrindex http://localhost:8080/solr/core3 crawl/crawldb > crawl/linkdb crawl/segments/* > > Is that the correct approach? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3276112.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3277359.html To unsubscribe from readdblink not showing alllinks, click here. -- View this message in context: http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3282183.html Sent from the Nutch - User mailing list archive at Nabble.com.

