You can try the fetch filter:
https://issues.apache.org/jira/browse/NUTCH-828
 
 
-----Original message-----
> From:shekhar sharma <shekhar2...@gmail.com>
> Sent: Tue 03-Jul-2012 06:42
> To: user@nutch.apache.org
> Subject: Filtering pages during crawling
> 
> Hello,
> Is it possible to define a filtering condition in Nutch, so that it should
> fetch only relevant pages. For example: i am interested only on pages which
> contains health related information and i have given the seed link as
> yahoo.com/health, so can i apply filtering condition to ignore all those
> pages which are not related to health.
> 
> Any suggestions?
> 
> Regards,
> Som
> 

Reply via email to