Are you using the depth parameter with the crawl command or are you using the separate generate, fetch etc. commands?
What's $ nutch readdb <crawldb> -stats returning? On Wednesday 09 February 2011 15:06:40 .: Abhishek :. wrote: > Hi Markus, > > I am sorry for not being clear, I meant to say that... > > Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in > turn contain links to a.html, b.html, c.html, d.html) is injected into the > seed.txt, after the whole process I was expecting a bunch of other pages > which crawled from this seed url. However, at the end of it all I see is > the contents from only this page namely > www.somehost.com/gifts/greetingcard.htmland I do not see any other > pages(here a.html, b.html, c.html, d.html) > crawled from this one. > > The crawling happens only for the URLs mentioned in the seed.txt and does > not proceed further from there. So I am just bit confused. Why is it not > crawling the linked pages(a.html, b.html, c.html and d.html). I get a > feeling that I am missing something that the author of the blog( > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed > everyone would know. > > Thanks, > Abi > > On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma <markus.jel...@openindex.io>wrote: > > The parsed data is only sent to the Solr index of you tell a segment to > > be indexed; solrindex <crawldb> <linkdb> <segment> > > > > If you did this only once after injecting and then the consequent > > fetch,parse,update,index sequence then you, of course, only see those > > URL's. > > If you don't index a segment after it's being parsed, you need to do it > > later > > on. > > > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote: > > > Hi all, > > > > > > I am a newbie to nutch and solr. Well relatively much newer to Solr > > > than > > > > > > Nutch :) > > > > > > I have been using nutch for past two weeks, and I wanted to know if I > > > > can > > > > > query or search on my nutch crawls on the fly(before it completes). I > > > am asking this because the websites I am crawling are really huge and > > > it > > > > takes > > > > > around 3-4 days for a crawl to complete. I want to analyze some quick > > > results while the nutch crawler is still crawling the URLs. Some one > > > suggested me that Solr would make it possible. > > > > > > I followed the steps in > > > > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. > > > By this process, I see only the injected URLs are shown in the Solr > > > search. > > > > I > > > > > know I did something really foolish and the crawl never happened, I > > > feel > > > > I > > > > > am missing some information here. I think somewhere in the process > > > there should be a crawling happening and I missed it out. > > > > > > Just wanted to see if some one could help me pointing this out and > > > where > > > > I > > > > > went wrong in the process. Forgive my foolishness and thanks for your > > > patience. > > > > > > Cheers, > > > Abi > > > > -- > > Markus Jelsma - CTO - Openindex > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350