What are you using for your Crawl Command Line? I remember trying to get mine to work and there was a line that wasn't very clear in the Tutorial.
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5 where you had to include where the -solr location was for it to index the files. If they are working separately then I would guess it's somewhere in the connection and this was my problem. Jerry E. Craig, Jr. -----Original Message----- From: John R. Brinkema [mailto:[email protected]] Sent: Monday, August 01, 2011 11:46 AM To: [email protected] Subject: Nutch-1.3 + Solr 3.3.0 = fail Friends, I am having the worst time getting nutch and solr to play together nicely. I downloaded and installed the current binaries for both nutch and solr. I edited the nutch-site.xml file to include: <property> <name>http.agent.name</name> <value>Solr/Nutch Search</value> </property> <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(text|html|tika)| index-basic|query-(basic|stemmer|site|url)|summary-basic|scoring-opic| urlnormalizer-(pass|regex|basic)</value> </property> <property> <name>http.content.limit</name> <value>65536</value> </property> <property> <name>searcher.dir</name> <value>/opt/SolrSearch</value> </property> I installed them and tested them according to each of their respective tutorials; in other words I believe each is working, separately. I crawled a url and the 'readdb -stats' report shows that I have successfully collected some links. Most of the links are to '.pdf' files. I followed the instructions to link nutch and solr; e.g. copy the nutch schema to become the solr schema. When I run the bin/nutch solrindex ... command I get the following error: java.io.IOException: Job failed! When I look in the log/hadoop.log file I see: 2011-08-01 13:10:00,086 INFO solr.SolrMappingReader - source: content dest: content 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: site dest: site 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: title dest: title 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: host dest: host 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: segment dest: segment 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: boost dest: boost 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: digest dest: digest 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url dest: id 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url dest: url 2011-08-01 13:10:00,537 WARN mapred.LocalJobRunner - job_local_0001 org.apache.solr.common.SolrException: Document [null] missing required field: id Document [null] missing required field: id request: http://localhost:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException: Job failed! The same error appears in the solr log. I have tried the 'sync solrj libraries' fix; that is, I copied apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no effect. Since I am running binaries, I, of course, did not run ant job. Is that the magic? Any suggestions?

