Hi Joe, I've not used this feature, it would be great if one of the others could chime in here. >From what I can infer from the correspondence on the issue, and the available patches, you should be applying the most recent one uploaded by Markus [0] as your starting point. This is dated as 22/11/2011.
On Sun, Jun 9, 2013 at 11:00 AM, Joe Zhang <smartag...@gmail.com> wrote: > > One of the comments mentioned the following: > > tika.use_boilerpipe=true > tika.boilerpipe.extractor=ArticleExtractor|CanolaExtractor > > which part the code is it referring to? > > You will see this included in one of the earlier patches uploaded by Markus on 11/05/2011 [1] > > Also, within the current Nutch config, should I focus on parse-plugin.xml? > > Look at the other patches and also Gabriele's comments. You may most likely need to alter something but AFAICT the work hasbeen done.. it's just a case of pulling together several contributions. Maybe you should look at the patch for 2.x (uploaded most recently by Roland) and see what is going on there. hth [0] https://issues.apache.org/jira/secure/attachment/12504736/NUTCH-961-1.5-1.patch [1] https://issues.apache.org/jira/secure/attachment/12478927/NUTCH-961-1.3-tikaparser1.patch