Hello - check out NUTCH-961. It adds support for Boilerpipe to Nutch' Tika 
parser. It's crude but works reasonably.
https://issues.apache.org/jira/browse/NUTCH-961

Markus
 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fluke...@ornl.gov>
> Sent: Thursday 26th March 2015 16:20
> To: user@nutch.apache.org
> Subject: Ignore navigation during index
> 
> Hi,
> 
> Is there a way to tell nutch to ignore the navigation or footer parts of an 
> html page during the crawl process?  Specifically I do not want the 
> information in the navigation or footer to be indexed.  My environment is 
> Windows 7 with Cygwin, Java 1.7, nutch 1.9 (binary not source) and solr 4.7.
> 
> Any assistance will be greatly appreciated.
> 
> Thanks,
> Jackie
> 
> 

Reply via email to