Hello my name is Antony and I'm new to apache nutch and solr. I want to crawl my website and therefore I downloaded nutch to do this. This works fine. But no I would like to integrate nutch with solr. Im running this on my unix system. Im trying to follow this tutorial: http://wiki.apache.org/nutch/NutchTutorial But it wont for me. Running Solr without nutch is no problem. I can post documents to solr with post.jar. But what I want to do is post my nutch crawl to solr. Now if I copy the schema.xml from nutch to apache-solr-4.0.0/example/solr/collection1/conf directory aned restart solr (java -jar start.jar), I get compiling errors but Solr will start. (Is this the correct directory to copy my schema?)
Nov 8, 2012 9:40:33 AM org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=nutch Nov 8, 2012 9:40:33 AM org.apache.solr.core.CoreContainer create SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571) at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:113) ... Nov 8, 2012 9:40:33 AM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571) at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846) ... Now if I don't copy the schema and push my nutch crawl to solr I get following error: SolrIndexer: starting at 2012-11-08 10:49:02 Indexing 5 documents java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2012-11-08 10:49:47 SolrDeleteDuplicates: Solr url: http://photon:8983/solr/ And this is taken from the logging: org.apache.solr.common.SolrException: ERROR: [doc= http://e-docs/infrastructure/cpuload_monitor.html] unknown field 'host' What should I do or what am I missing? I hope you can help me Best Regards Antony