On Monday 20 June 2011 06:36:10 [email protected] wrote: > He thanks for the response. I had lost hope of one. :-) > > 3. Firstly, yes I am using Solr for indexing. So whatever you have said > makes a lot of sense. For 404 pages, which are not picked up in crawl I am > doing a manual delete as of now, but it is a pain. I am thinking of some > ways to get this automated.
404 should be picked up when doing a recrawl. Keep in mind that Nutch will only recrawl if some time has passed since the last fetch. This depends on your fetch schedule. > > 2. Re-indexing is not the issue. The issue is during a re-crawl even when > nutch picks up a change I dont see my index getting updated. Don't know > what is missing. This however happens, only in a few cases. > > 1. db.update.additions.allowed is set to true, so I guess things seem to be > in place there. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Crawling-basic-questions-tp3057896p3084 > 813.html Sent from the Nutch - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

