Yes, I'm trying to do it...

One of the comments mentioned the following:

tika.use_boilerpipe=true
tika.boilerpipe.extractor=ArticleExtractor|CanolaExtractor

which part the code is it referring to?


Also, within the current Nutch config, should I focus on parse-plugin.xml?



On Sun, Jun 9, 2013 at 9:30 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Joe,
> Well you can apply and patch to you're source like the following
> patch -p0 -i patch_name.patch
> You may have some problems if the Nutch code has moved on a bit and the
> patch may not apply cleanly or may fail to apply at all. In this case you
> need to get your hands dirty and update the patch to work with your code.
> To answer the second part, yes Tika is integrated into Nutch but your
> definition of fully might be different from someone else. There is an issue
> and patches available for Boilerplate, therefore this functionality is not
> integrated yet.
> I would start buy reading the commentary on the Jira issue then mostly
> likely choosing the most recent patch attachment and working with it.
> hth
>
>
> On Saturday, June 8, 2013, Joe Zhang <smartag...@gmail.com> wrote:
> > Sorry Lewis. I never really know how to work with these patches.
> >
> > Isn't Tika already fully integrated into current version of Nutch? Which
> > config file should I touch?
> >
> >
> > On Sat, Jun 8, 2013 at 10:56 PM, Lewis John Mcgibbney <
> > lewis.mcgibb...@gmail.com> wrote:
> >
> >> Hi Joe,
> >>
> >> https://issues.apache.org/jira/browse/NUTCH-961
> >>
> >>
> >> On Saturday, June 8, 2013, Joe Zhang <smartag...@gmail.com> wrote:
> >> > Can somebody please point me to some sample code?
> >> >
> >> > Thanks much!
> >> >
> >>
> >> --
> >> *Lewis*
> >>
> >
>
> --
> *Lewis*
>

Reply via email to