Yes, I'm trying to do it... One of the comments mentioned the following:
tika.use_boilerpipe=true tika.boilerpipe.extractor=ArticleExtractor|CanolaExtractor which part the code is it referring to? Also, within the current Nutch config, should I focus on parse-plugin.xml? On Sun, Jun 9, 2013 at 9:30 AM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Joe, > Well you can apply and patch to you're source like the following > patch -p0 -i patch_name.patch > You may have some problems if the Nutch code has moved on a bit and the > patch may not apply cleanly or may fail to apply at all. In this case you > need to get your hands dirty and update the patch to work with your code. > To answer the second part, yes Tika is integrated into Nutch but your > definition of fully might be different from someone else. There is an issue > and patches available for Boilerplate, therefore this functionality is not > integrated yet. > I would start buy reading the commentary on the Jira issue then mostly > likely choosing the most recent patch attachment and working with it. > hth > > > On Saturday, June 8, 2013, Joe Zhang <smartag...@gmail.com> wrote: > > Sorry Lewis. I never really know how to work with these patches. > > > > Isn't Tika already fully integrated into current version of Nutch? Which > > config file should I touch? > > > > > > On Sat, Jun 8, 2013 at 10:56 PM, Lewis John Mcgibbney < > > lewis.mcgibb...@gmail.com> wrote: > > > >> Hi Joe, > >> > >> https://issues.apache.org/jira/browse/NUTCH-961 > >> > >> > >> On Saturday, June 8, 2013, Joe Zhang <smartag...@gmail.com> wrote: > >> > Can somebody please point me to some sample code? > >> > > >> > Thanks much! > >> > > >> > >> -- > >> *Lewis* > >> > > > > -- > *Lewis* >