_boilerpipe
> > > true
> > >
> > >
> > > tika.boilerpipe.extractor
> > > ArticleExtractor
> > >
> > >
> > > and it should work
> > >
> > > -Original message-
> > > &
ome patching (see linked issues) and manual upgrade to
> Boilerpipe 1.2.0.
>
> -Original message-
> > From:Joe Zhang
> > Sent: Tue 11-Jun-2013 21:19
> > To: user
> > Subject: Re: using Tika within Nutch to remove boiler plates?
> >
> >
acted text as a whole and does not consider page semantics.
> >
> > [1]: https://issues.apache.org/jira/browse/NUTCH-1414
> >
> > -----Original message-
> > > From:Joe Zhang
> > > Sent: Tue 11-Jun-2013 18:06
> > > To: user
> > &g
Original message-
> > From:Joe Zhang
> > Sent: Tue 11-Jun-2013 18:06
> > To: user
> > Subject: Re: using Tika within Nutch to remove boiler plates?
> >
> > Any particular reason why you don't use boilerpipe any more? So what do
> you
> > suggest
es not consider page semantics.
[1]: https://issues.apache.org/jira/browse/NUTCH-1414
-Original message-
> From:Joe Zhang
> Sent: Tue 11-Jun-2013 18:06
> To: user
> Subject: Re: using Tika within Nutch to remove boiler plates?
>
> Any particular reason why you don
utch-site.xml as
>
>
> tika.use_boilerpipe
> true
>
>
> tika.boilerpipe.extractor
> ArticleExtractor
>
>
> and it should work
>
> -Original message-
> > From:Joe Zhang
> > Sent: Tue 11-Jun-2013 01:42
> > To: user
> > Subje
11-Jun-2013 01:42
> To: user
> Subject: Re: using Tika within Nutch to remove boiler plates?
>
> Marcus, do you mind sharing a sample nutch-site.xml?
>
>
> On Mon, Jun 10, 2013 at 1:42 AM, Markus Jelsma
> wrote:
>
> > Those settings belong to nutch-site. Enable
ibbney
> > Sent: Sun 09-Jun-2013 20:47
> > To: user@nutch.apache.org
> > Subject: Re: using Tika within Nutch to remove boiler plates?
> >
> > Hi Joe,
> > I've not used this feature, it would be great if one of the others could
> > chime in here.
>
Those settings belong to nutch-site. Enable BP and set the correct extractor
and it should work just fine.
-Original message-
> From:Lewis John Mcgibbney
> Sent: Sun 09-Jun-2013 20:47
> To: user@nutch.apache.org
> Subject: Re: using Tika within Nutch to remove boiler pla
Hi Joe,
I've not used this feature, it would be great if one of the others could
chime in here.
>From what I can infer from the correspondence on the issue, and the
available patches, you should be applying the most recent one uploaded by
Markus [0] as your starting point. This is dated as 22/11/20
Yes, I'm trying to do it...
One of the comments mentioned the following:
tika.use_boilerpipe=true
tika.boilerpipe.extractor=ArticleExtractor|CanolaExtractor
which part the code is it referring to?
Also, within the current Nutch config, should I focus on parse-plugin.xml?
On Sun, Jun 9, 2013
Hi Joe,
Well you can apply and patch to you're source like the following
patch -p0 -i patch_name.patch
You may have some problems if the Nutch code has moved on a bit and the
patch may not apply cleanly or may fail to apply at all. In this case you
need to get your hands dirty and update the patch
Sorry Lewis. I never really know how to work with these patches.
Isn't Tika already fully integrated into current version of Nutch? Which
config file should I touch?
On Sat, Jun 8, 2013 at 10:56 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi Joe,
>
> https://issues.apache.org
Hi Joe,
https://issues.apache.org/jira/browse/NUTCH-961
On Saturday, June 8, 2013, Joe Zhang wrote:
> Can somebody please point me to some sample code?
>
> Thanks much!
>
--
*Lewis*
14 matches
Mail list logo