Thanks for replying ..I am not able to fetch keyword my nutch-site.xml is <configuration> <property> <name>http.agent.name</name> <value>My Nutch Spider</value> </property> <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property> <property> <name>metatags.names</name> <value>description;keywords</value> <description> Names of the metatags to extract, separated by;. Use '*' to extract all metatags. Prefixes the names with 'metatag.' in the parse-metadata. For instance to index description and keywords, you need to activate the plugin index-metadata and set the value of the parameter 'index.parse.md' to 'metatag.description;metatag.keywords'. </description> </property> <property> <name>index.parse.md</name> <value>metatag.description,metatag.keywords</value> <description> Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin) </description> </property>
</configuration> and solr schema has following field ~ <fields><field name="id" type="string" stored="true" indexed="true"/><!-- core fields --><field name="segment" type="string" stored="true" indexed="false"/><field name="digest" type="string" stored="true" indexed="false"/><field name="boost" type="float" stored="true" indexed="false"/><!-- fields for index-basic plugin --><field name="host" type="url" stored="false" indexed="true"/><field name="site" type="string" stored="false" indexed="true"/><field name="url" type="url" stored="true" indexed="true" required="true"/><field name="content" type="text" stored="true" indexed="true"/><field name="title" type="text" stored="true" indexed="true"/><field name="cache" type="string" stored="true" indexed="false"/><field name="tstamp" type="date" stored="true" indexed="false"/><!-- fields for index-anchor plugin --><field name="anchor" type="string" stored="true" indexed="true" multiValued="true"/><!-- fields for index-more plugin --><field name="type" type="string" stored="true" indexed="true" multiValued="true"/><field name="contentLength" type="long" stored="true" indexed="false"/><field name="lastModified" type="date" stored="true" indexed="false"/><field name="date" type="date" stored="true" indexed="true"/><!-- fields for languageidentifier plugin --><field name="lang" type="string" stored="true" indexed="true"/><!-- fields for subcollection plugin --><field name="subcollection" type="string" stored="true" indexed="true" multiValued="true"/><!-- fields for feed plugin (tag is also used by microformats-reltag)--><field name="author" type="string" stored="true" indexed="true"/><field name="tag" type="string" stored="true" indexed="true" multiValued="true"/><field name="feed" type="string" stored="true" indexed="true"/><field name="publishedDate" type="date" stored="true" indexed="true"/><field name="updatedDate" type="date" stored="true" indexed="true"/><!-- fields for creativecommons plugin --><field name="cc" type="string" stored="true" indexed="true" multiValued="true"/><!-- fields for the metatags plugin --><field name="metatag.description" type="text" stored="true" indexed="true"/><field name="metatag.keywords" type="text" stored="true" indexed="true"/></fields> i am not able to get the problem . Ihave created the own plugin bt it is not populated . when we crawl. please help me to find the reason. On Wed, May 23, 2012 at 3:47 PM, Julien Nioche < [email protected]> wrote: > the urlmeta plugin is not what you are after. see instructions on > http://wiki.apache.org/nutch/IndexMetatags > > On 23 May 2012 10:30, abhishek tiwari <[email protected]> wrote: > > > Hi, i am new for nutch. > > > > > > > > i want to use urlmeta plugin bt not able to fetch meta tags . > > > > > > 1) Added folllowing in nutch-site.xml > > > > <property> > > <name>plugin.includes</name> > > > > > > <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)|urlmeta</value> > > <description>Regular expression naming plugin directory names to > > include. Any plugin not matching this expression is excluded. > > In any case you need at least include the nutch-extensionpoints plugin. > By > > default Nutch includes crawling just HTML and plain text via HTTP, > > and basic indexing and search plugins. > > </description> > > </property> > > <property> > > <name>urlmeta.tags</name> > > <value></value> > > <description> > > > > </description> > > </property> > > > > > > 2) Added <field name="keywords" type="string" stored="true" > > indexed="true"/> in solr schema.xml > > > > 3) run bin/nutch crawl urls -solr http://localhost:8080/solr -depth 3 > > -topN 5 > > > > url and other stuffs also done > > > > but keyword field is not getting populated . > > > > please suggest what i am missing. > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

