On Monday 20 June 2011 06:36:10 [email protected] wrote:
> He thanks for the response. I had lost hope of one. :-)
> 
> 3. Firstly, yes I am using Solr for indexing. So whatever you have said
> makes a lot of sense. For 404 pages, which are not picked up in crawl I am
> doing a manual delete as of now, but it is a pain. I am thinking of some
> ways to get this automated.

404 should be picked up when doing a recrawl. Keep in mind that Nutch will 
only recrawl if some time has passed since the last fetch. This depends on 
your fetch schedule.

> 
> 2. Re-indexing is not the issue. The issue is during a re-crawl even when
> nutch picks up a change I dont see my index getting updated. Don't know
> what is missing. This however happens, only in a few cases.
> 
> 1. db.update.additions.allowed is set to true, so I guess things seem to be
> in place there.
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Crawling-basic-questions-tp3057896p3084
> 813.html Sent from the Nutch - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to