On Thursday 03 May 2012 10:52:25 ML mail wrote: > Thanks Markus for your tip. I now tried the "parsechecker" and it works > perfectly, I can see the "Parse Metadata" info which contains the keywrods > and description. I then suppose the documentation on the > wiki http://wiki.apache.org/nutch/IndexMetatags is wrong as it mentions > using "indexchecker" instead...
The docs are correct. With parsechecker you can see the output of a parse filter. With indexchecker you can only see output of a index filter. You need both a parse filter and an index filter to complete the chain from web page to an indexed document. > > > > > ________________________________ > From: Markus Jelsma <markus.jel...@openindex.io> > To: ML mail <mlnos...@yahoo.com> > Cc: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>; user@nutch.apache.org > Sent: Thursday, May 3, 2012 9:32 AM > Subject: Re: Indexing meta tags in Nutch 1.4 > > You should see it with the parsechecker tool but not with the indexchecker > because you don't have an indexing filter plugin included that reads and > emits what's output but the parse filter. Use the index-metadata plugin. > > On Thu, 3 May 2012 00:25:42 -0700 (PDT), ML mail <mlnos...@yahoo.com> wrote: > > Dear Lewis, > > > > Thanks for the README about the parse-metatags plugin. I have now > > double checked and I have the metatags.names property in my > > nutch-site.xml config file as well as the other required properties. > > Still when running "nutch indexchecker URL" I don't see any > > description or keywords fields :( > > > > Below I have pasted the relevant parts of my nutch-site.xml config file: > > > > <property> > > <name>index.parse.md</name> > > <value>metatag.description,metatag.keywords</value> > > </property> > > > > > > <property> > > <name>metatags.names</name> > > <value>description;keywords</value> > > </property> > > > > > > <property> > > <name>plugin.includes</name> > > > > > > <value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(ba > > sic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value > > > </property> > > > > As far as I know this all looks correct but maybe you can see > > something wrong? or anything else I might check? > > > > Regards > > > > > > > > ________________________________ > > > > From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> > > > > To: user@nutch.apache.org; ML mail <mlnos...@yahoo.com> > > Sent: Wednesday, May 2, 2012 12:49 PM > > Subject: Re: Indexing meta tags in Nutch 1.4 > > > > Hi, > > > > Please also see the README Julien kindly provided with the > > parse-metatags plugin. > > > > > > https://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-metatags/READM > > E.txt?view=markup > > > > I'm hoping there should be enough info to get it working flawlessly. > > Remember, any changes you make to your config files should really be > > recompiled before moving on to a more serious deployment. > > > > On Tue, May 1, 2012 at 12:38 PM, ML mail <mlnos...@yahoo.com> wrote: > >> Hi Lewis, > >> > >> Thanks to your explanations, I managed to get the parse-metatags plugin > >> built and installed into the runtime/local/plugins directory. So no I > >> have the index-metatags from the ZIP file as well as the parse-metatags > >> plugin from the patch installed and wanted to check if they are > >> working. I followed step-by-step the guide > >> on http://wiki.apache.org/nutch/IndexMetatags and came to the part > >> where you check with the "nutch indexchecker URL" command for the > >> metatag fields. Unfortunately, in the output of that command I don't > >> see any keywords or description fields :( just the usual ones > >> (site,title,content,etc). > >> > >> Am I missing something here? > >> > >> Also let me know if you need more details or my nutch-site.xml config > >> file... > >> > >> Regards > > -- Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536600 / 06-50258350 -- Markus Jelsma - CTO - Openindex