Hi - Nutch 1.5 has a -deleteGone switch for the SolrIndexer job.This will 
delete permanent redirects and 404's that have been discovered during the 
crawl. 1.6 also has a  -deleteRobotsNoIndex that will delete pages that have a 
robots meta tag with a noindex value.
 
 
-----Original message-----
> From:David Philip <davidphilipshe...@gmail.com>
> Sent: Wed 26-Dec-2012 06:28
> To: user@nutch.apache.org
> Subject: Nutch approach for DeadLinks
> 
> Hi  All,
> 
>      How does nutch  work with deadlinks? say for example, there is a blog
> site being crawled today and all the blogs (documents) are indexed to solr.
> Tomorrow, if one of the blog is deleted which mean that  the  URL indexed
> yesterday is no more working today! In such cases,  How to update the solr
> indexes such that this particular blog doesn’t come in search results?
> Recrawling the same site didn’t delete this record in solr. How to handle
> such cases? I am using nutch 1.5.1 bin.  Thanks David
> 

Reply via email to