On Thursday 03 May 2012 10:52:25 ML mail wrote:
> Thanks Markus for your tip. I now tried the "parsechecker" and it works
> perfectly, I can see the "Parse Metadata" info which contains the keywrods
> and description. I then suppose the documentation on the
> wiki http://wiki.apache.org/nutch/IndexMetatags is wrong as it mentions
> using "indexchecker" instead...

The docs are correct. With parsechecker you can see the output of a parse 
filter. With indexchecker you can only see output of a index filter. You need 
both a parse filter and an index filter to complete the chain from web page to 
an indexed document.

> 
> 
> 
> 
> ________________________________
>  From: Markus Jelsma <markus.jel...@openindex.io>
> To: ML mail <mlnos...@yahoo.com>
> Cc: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>; user@nutch.apache.org
> Sent: Thursday, May 3, 2012 9:32 AM
> Subject: Re: Indexing meta tags in Nutch 1.4
> 
> You should see it with the parsechecker tool but not with the indexchecker
> because you don't have an indexing filter plugin included that reads and
> emits what's output but the parse filter. Use the index-metadata plugin.
> 
> On Thu, 3 May 2012 00:25:42 -0700 (PDT), ML mail <mlnos...@yahoo.com> wrote:
> > Dear Lewis,
> > 
> > Thanks for the README about the parse-metatags plugin. I have now
> > double checked and I have the metatags.names property in my
> > nutch-site.xml config file as well as the other required properties.
> > Still when running "nutch indexchecker URL" I don't see any
> > description or keywords fields :( 
> > 
> > Below I have pasted the relevant parts of my nutch-site.xml config file:
> > 
> > <property>
> >         <name>index.parse.md</name>
> >         <value>metatag.description,metatag.keywords</value>
> > </property>
> > 
> > 
> > <property>
> >         <name>metatags.names</name>
> >         <value>description;keywords</value>
> > </property>
> > 
> > 
> > <property>
> >         <name>plugin.includes</name>
> >        
> > 
> > <value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(ba
> > sic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value
> > > </property>
> > 
> > As far as I know this all looks correct but maybe you can see
> > something wrong? or anything else I might check?
> > 
> > Regards
> > 
> > 
> > 
> > ________________________________
> >
> >  From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
> >
> > To: user@nutch.apache.org; ML mail <mlnos...@yahoo.com>
> > Sent: Wednesday, May 2, 2012 12:49 PM
> > Subject: Re: Indexing meta tags in Nutch 1.4
> > 
> > Hi,
> > 
> > Please also see the README Julien kindly provided with the
> > parse-metatags plugin.
> > 
> > 
> > https://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-metatags/READM
> > E.txt?view=markup
> > 
> > I'm hoping there should be enough info to get it working flawlessly.
> > Remember, any changes you make to your config files should really be
> > recompiled before moving on to a more serious deployment.
> > 
> > On Tue, May 1, 2012 at 12:38 PM, ML mail <mlnos...@yahoo.com> wrote:
> >> Hi Lewis,
> >> 
> >> Thanks to your explanations, I managed to get the parse-metatags plugin
> >> built and installed into the runtime/local/plugins directory. So no I
> >> have the index-metatags from the ZIP file as well as the parse-metatags
> >> plugin from the patch installed and wanted to check if they are
> >> working. I followed step-by-step the guide
> >> on http://wiki.apache.org/nutch/IndexMetatags and came to the part
> >> where you check with the "nutch indexchecker URL" command for the
> >> metatag fields. Unfortunately, in the output of that command I don't
> >> see any keywords or description fields :( just the usual ones
> >> (site,title,content,etc).
> >> 
> >> Am I missing something here?
> >> 
> >> Also let me know if you need more details or my nutch-site.xml config
> >> file...
> >> 
> >> Regards
> 
> -- Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536600 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex

Reply via email to