Hello, I have been trying this for several days without success. (nutch 1.16 - solr 7.3.1)
I have followed this description: https://cwiki.apache.org/confluence/display/nutch/IndexMetatags Below I put my file nutch-site.xml I have created the core following this description: https://cwiki.apache.org/confluence/display/nutch/NutchTutorial/ By the way without the metatags everything works fine. Bevor creating the core I deleted the managed-schema.xml and inserted my metatag fields into schema.xml in the configsets directory of the core <field name="metatag.SITdescription" type="text_general" stored="true" indexed="true" multiValued="true"/> <field name="metatag.SITkeywords" type="text_general" stored="true" indexed="true" multiValued="true"/> First Question: After creating the core I see a managed-schema.xml file and a schema.xml.bak file in the conf directory of the core. Sorry I am new to this, but I believe I do not want managed-schema.xml??? (See description above) Anyway when I run the crawl all is ok until the index is created. Then I end up with the error: org.apache.solr.common.SolrException: copyField dest :'metatag.SITdescription_str' is not an explicit field and doesn't match a dynamicField. at org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:902) at org.apache.solr.schema.ManagedIndexSchema.addCopyFields(ManagedIndexSchema.java:784) There is no copyfield instruction for metatag.SITdescription in managed-schema.xml. I even created a field "metatag.SITdescription_str" in managed-schema.xml which did not help. Can you help me please Best Regards Martin nutch-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>http.agent.name</name> <value>SIT_NUTCH_SPIDER</value> </property> <property> <name>db.ignore.external.links</name> <value>true</value> <description>If true, outlinks leading from a page to external hosts will be ignored. This is an effective way to limit the crawl to include only initially injected hosts, without creating complex URLFilters. </description> </property> <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-(regex|validator)|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. By default Nutch includes plugins to crawl HTML and various other document formats via HTTP/HTTPS and indexing the crawled content into Solr. More plugins are available to support more indexing backends, to fetch ftp:// and file:// URLs, for focused crawling, and many other use cases. </description> </property> <property> <name>http.robot.rules.whitelist</name> <value>sitlux02.sit.de</value> <description>Comma separated list of hostnames or IP addresses to ignore robot rules parsing for. </description> </property> <property> <name>metatags.names</name> <value>SITdescription,SITkeywords,SITcategory,SITintern</value> <description> Names of the metatags to extract, separated by ','. Use '*' to extract all metatags. Prefixes the names with 'metatag.' in the parse-metadata. For instance to index description and keywords, you need to activate the plugin index-metadata and set the value of the parameter 'index.parse.md' to 'metatag.description,metatag.keywords'. </description> </property> <property> <name>index.parse.md</name> <value>metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern</value> <description> Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin) </description> </property> <property> <name>index.metadata</name> <value>metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern</value> <description> Comma-separated list of keys to be taken from the metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin), and property 'metatags.names'. </description> </property> </configuration> -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html