Those settings belong to nutch-site. Enable BP and set the correct extractor
and it should work just fine.
-----Original message-----
> From:Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
> Sent: Sun 09-Jun-2013 20:47
> To: user@nutch.apache.org
> Subject: Re: using Tika within Nutch to remove boiler plates?
>
> Hi Joe,
> I've not used this feature, it would be great if one of the others could
> chime in here.
> From what I can infer from the correspondence on the issue, and the
> available patches, you should be applying the most recent one uploaded by
> Markus [0] as your starting point. This is dated as 22/11/2011.
>
> On Sun, Jun 9, 2013 at 11:00 AM, Joe Zhang <smartag...@gmail.com> wrote:
>
> >
> > One of the comments mentioned the following:
> >
> > tika.use_boilerpipe=true
> > tika.boilerpipe.extractor=ArticleExtractor|CanolaExtractor
> >
> > which part the code is it referring to?
> >
> >
> You will see this included in one of the earlier patches uploaded by Markus
> on 11/05/2011 [1]
>
>
> >
> > Also, within the current Nutch config, should I focus on parse-plugin.xml?
> >
> >
> Look at the other patches and also Gabriele's comments. You may most likely
> need to alter something but AFAICT the work hasbeen done.. it's just a case
> of pulling together several contributions.
>
> Maybe you should look at the patch for 2.x (uploaded most recently by
> Roland) and see what is going on there.
>
> hth
>
> [0]
> https://issues.apache.org/jira/secure/attachment/12504736/NUTCH-961-1.5-1.patch
> [1]
> https://issues.apache.org/jira/secure/attachment/12478927/NUTCH-961-1.3-tikaparser1.patch
>