You can try the fetch filter: https://issues.apache.org/jira/browse/NUTCH-828 -----Original message----- > From:shekhar sharma <shekhar2...@gmail.com> > Sent: Tue 03-Jul-2012 06:42 > To: user@nutch.apache.org > Subject: Filtering pages during crawling > > Hello, > Is it possible to define a filtering condition in Nutch, so that it should > fetch only relevant pages. For example: i am interested only on pages which > contains health related information and i have given the seed link as > yahoo.com/health, so can i apply filtering condition to ignore all those > pages which are not related to health. > > Any suggestions? > > Regards, > Som >