not able to see metadata plugin while registering 012-05-29 18:45:07,701 INFO plugin.PluginRepository - Plugins: looking in: /var/www/html/nutch/runtime/local/plugins 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Registered Plugins: 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2012-05-29 18:45:07,759 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Registered Extension-Points: 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2012-05-29 18:45:07,760 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
On Tue, May 29, 2012 at 6:37 PM, abhishek tiwari <[email protected] > wrote: > Thanks for replying ..I am not able to fetch keyword > my nutch-site.xml is > > <configuration> > <property> > <name>http.agent.name</name> > <value>My Nutch Spider</value> > </property> > <property> > <name>plugin.includes</name> > > <value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > </property> > <property> > <name>metatags.names</name> > <value>description;keywords</value> > <description> Names of the metatags to extract, separated by;. > Use '*' to extract all metatags. Prefixes the names with 'metatag.' > in the parse-metadata. For instance to index description and keywords, > you need to activate the plugin index-metadata and set the value of the > parameter 'index.parse.md' to 'metatag.description;metatag.keywords'. > </description> > </property> > <property> > <name>index.parse.md</name> > <value>metatag.description,metatag.keywords</value> > <description> > Comma-separated list of keys to be taken from the parse metadata to > generate fields. > Can be used e.g. for 'description' or 'keywords' provided that these > values are generated > by a parser (see parse-metatags plugin) > </description> > </property> > > </configuration> > > and solr schema has following field > > ~ <fields><field name="id" type="string" stored="true" > indexed="true"/><!-- core fields --><field name="segment" type="string" > stored="true" indexed="false"/><field name="digest" type="string" > stored="true" indexed="false"/><field name="boost" type="float" > stored="true" indexed="false"/><!-- fields for index-basic plugin --><field > name="host" type="url" stored="false" indexed="true"/><field name="site" > type="string" stored="false" indexed="true"/><field name="url" type="url" > stored="true" indexed="true" required="true"/><field name="content" > type="text" stored="true" indexed="true"/><field name="title" type="text" > stored="true" indexed="true"/><field name="cache" type="string" > stored="true" indexed="false"/><field name="tstamp" type="date" > stored="true" indexed="false"/><!-- fields for index-anchor plugin > --><field name="anchor" type="string" stored="true" indexed="true" > multiValued="true"/><!-- fields for index-more plugin --><field name="type" > type="string" stored="true" indexed="true" multiValued="true"/><field > name="contentLength" type="long" stored="true" indexed="false"/><field > name="lastModified" type="date" stored="true" indexed="false"/><field > name="date" type="date" stored="true" indexed="true"/><!-- fields for > languageidentifier plugin --><field name="lang" type="string" stored="true" > indexed="true"/><!-- fields for subcollection plugin --><field > name="subcollection" type="string" stored="true" indexed="true" > multiValued="true"/><!-- fields for feed plugin (tag is also used by > microformats-reltag)--><field name="author" type="string" stored="true" > indexed="true"/><field name="tag" type="string" stored="true" > indexed="true" multiValued="true"/><field name="feed" type="string" > stored="true" indexed="true"/><field name="publishedDate" type="date" > stored="true" indexed="true"/><field name="updatedDate" type="date" > stored="true" indexed="true"/><!-- fields for creativecommons plugin > --><field name="cc" type="string" stored="true" indexed="true" > multiValued="true"/><!-- fields for the metatags plugin --><field > name="metatag.description" type="text" stored="true" indexed="true"/><field > name="metatag.keywords" type="text" stored="true" indexed="true"/></fields> > > > i am not able to get the problem . > > Ihave created the own plugin bt it is not populated . when we crawl. > > please help me to find the reason. > > > On Wed, May 23, 2012 at 3:47 PM, Julien Nioche < > [email protected]> wrote: > >> the urlmeta plugin is not what you are after. see instructions on >> http://wiki.apache.org/nutch/IndexMetatags >> >> On 23 May 2012 10:30, abhishek tiwari <[email protected]> wrote: >> >> > Hi, i am new for nutch. >> > >> > >> > >> > i want to use urlmeta plugin bt not able to fetch meta tags . >> > >> > >> > 1) Added folllowing in nutch-site.xml >> > >> > <property> >> > <name>plugin.includes</name> >> > >> > >> >> <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)|urlmeta</value> >> > <description>Regular expression naming plugin directory names to >> > include. Any plugin not matching this expression is excluded. >> > In any case you need at least include the nutch-extensionpoints >> plugin. By >> > default Nutch includes crawling just HTML and plain text via HTTP, >> > and basic indexing and search plugins. >> > </description> >> > </property> >> > <property> >> > <name>urlmeta.tags</name> >> > <value></value> >> > <description> >> > >> > </description> >> > </property> >> > >> > >> > 2) Added <field name="keywords" type="string" stored="true" >> > indexed="true"/> in solr schema.xml >> > >> > 3) run bin/nutch crawl urls -solr http://localhost:8080/solr -depth 3 >> > -topN 5 >> > >> > url and other stuffs also done >> > >> > but keyword field is not getting populated . >> > >> > please suggest what i am missing. >> > >> >> >> >> -- >> * >> *Open Source Solutions for Text Engineering >> >> http://digitalpebble.blogspot.com/ >> http://www.digitalpebble.com >> http://twitter.com/digitalpebble >> > >

