Hello Hany,

Using parse-tika as your HTML parser, you can enable Boilerpipe (see 
nutch-default).

Regards,
Markus

 
 
-----Original message-----
> From:hany.n...@hsbc.com <hany.n...@hsbc.com>
> Sent: Wednesday 14th November 2018 15:53
> To: user@nutch.apache.org
> Subject: Block certain parts of HTML code from being indexed
> 
> Hello All,
> 
> I am using Nutch 1.15, and wondering if there is a feature for blocking 
> certain parts of HTML code from being indexed (header & footer).
> 
> Kind regards,
> Hany Shehata
> Solutions Architect, Marketing and Communications IT
> Corporate Functions | HSBC Operations, Services and Technology (HOST)
> ul. Kapelanka 42A, 30-347 Kraków, Poland
> __________________________________________________________________
> 
> Tie line: 7148 7689 4698
> External: +48 123 42 0698
> Mobile: +48 723 680 278
> E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com>
> __________________________________________________________________
> Protect our environment - please only print this if you have to!
> 
> 
> 
> -----------------------------------------
> SAVE PAPER - THINK BEFORE YOU PRINT!
> 
> This E-mail is confidential.  
> 
> It may also be legally privileged. If you are not the addressee you may not 
> copy,
> forward, disclose or use any part of it. If you have received this message in 
> error,
> please delete it and all copies from your system and notify the sender 
> immediately by
> return E-mail.
> 
> Internet communications cannot be guaranteed to be timely secure, error or 
> virus-free.
> The sender does not accept liability for any errors or omissions.
> 

Reply via email to