Hi - Nutch 1.5 has a -deleteGone switch for the SolrIndexer job.This will delete permanent redirects and 404's that have been discovered during the crawl. 1.6 also has a -deleteRobotsNoIndex that will delete pages that have a robots meta tag with a noindex value. -----Original message----- > From:David Philip <davidphilipshe...@gmail.com> > Sent: Wed 26-Dec-2012 06:28 > To: user@nutch.apache.org > Subject: Nutch approach for DeadLinks > > Hi All, > > How does nutch work with deadlinks? say for example, there is a blog > site being crawled today and all the blogs (documents) are indexed to solr. > Tomorrow, if one of the blog is deleted which mean that the URL indexed > yesterday is no more working today! In such cases, How to update the solr > indexes such that this particular blog doesn’t come in search results? > Recrawling the same site didn’t delete this record in solr. How to handle > such cases? I am using nutch 1.5.1 bin. Thanks David >