Re: using Tika within Nutch to remove boiler plates?

Lewis John Mcgibbney Sun, 09 Jun 2013 11:48:13 -0700

Hi Joe,
I've not used this feature, it would be great if one of the others could
chime in here.
>From what I can infer from the correspondence on the issue, and the
available patches, you should be applying the most recent one uploaded by
Markus [0] as your starting point. This is dated as 22/11/2011.


On Sun, Jun 9, 2013 at 11:00 AM, Joe Zhang <smartag...@gmail.com> wrote:

>
> One of the comments mentioned the following:
>
> tika.use_boilerpipe=true
> tika.boilerpipe.extractor=ArticleExtractor|CanolaExtractor
>
> which part the code is it referring to?
>
>
You will see this included in one of the earlier patches uploaded by Markus
on 11/05/2011 [1]


>
> Also, within the current Nutch config, should I focus on parse-plugin.xml?
>
>
Look at the other patches and also Gabriele's comments. You may most likely
need to alter something but AFAICT the work hasbeen done.. it's just a case
of pulling together several contributions.

Maybe you should look at the patch for 2.x (uploaded most recently by
Roland) and see what is going on there.

hth

[0]
https://issues.apache.org/jira/secure/attachment/12504736/NUTCH-961-1.5-1.patch
[1]
https://issues.apache.org/jira/secure/attachment/12478927/NUTCH-961-1.3-tikaparser1.patch

Re: using Tika within Nutch to remove boiler plates?

Reply via email to