Try changing uniqueKey from id to url as below under in schema.xml and restart Solr: <uniqueKey>url</uniqueKey>
If that still did not work, that means you are having an empty url. We can fix that. On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema <[email protected] > wrote: > Friends, > > I am having the worst time getting nutch and solr to play together nicely. > > I downloaded and installed the current binaries for both nutch and solr. I > edited the nutch-site.xml file to include: > > <property> > <name>http.agent.name</name> > <value>Solr/Nutch Search</value> > </property> > <property> > <name>plugin.includes</name> > <value>protocol-http|**urlfilter-regex|parse-(text|**html|tika)| > index-basic|query-(basic|**stemmer|site|url)|summary-**basic|scoring-opic| > urlnormalizer-(pass|regex|**basic)</value> > </property> > <property> > <name>http.content.limit</**name> > <value>65536</value> > </property> > <property> > <name>searcher.dir</name> > <value>/opt/SolrSearch</value> > </property> > > > I installed them and tested them according to each of their respective > tutorials; in other words I believe each is working, separately. I crawled > a url and the 'readdb -stats' report shows that I have successfully > collected some links. Most of the links are to '.pdf' files. > > I followed the instructions to link nutch and solr; e.g. copy the nutch > schema to become the solr schema. > > When I run the bin/nutch solrindex ... command I get the following error: > > java.io.IOException: Job failed! > > When I look in the log/hadoop.log file I see: > > 2011-08-01 13:10:00,086 INFO solr.SolrMappingReader - source: content > dest: content > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: site dest: > site > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: title dest: > title > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: host dest: > host > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: segment > dest: segment > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: boost dest: > boost > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: digest dest: > digest > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url dest: id > 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url dest: > url > 2011-08-01 13:10:00,537 WARN mapred.LocalJobRunner - job_local_0001 > org.apache.solr.common.**SolrException: Document [null] missing required > field: id > > Document [null] missing required field: id > > request: > http://localhost:8983/solr/**update?wt=javabin&version=2<http://localhost:8983/solr/update?wt=javabin&version=2> > at org.apache.solr.client.solrj.**impl.CommonsHttpSolrServer.** > request(CommonsHttpSolrServer.**java:435) > at org.apache.solr.client.solrj.**impl.CommonsHttpSolrServer.** > request(CommonsHttpSolrServer.**java:244) > at org.apache.solr.client.solrj.**request.AbstractUpdateRequest.** > process(AbstractUpdateRequest.**java:105) > at org.apache.solr.client.solrj.**SolrServer.add(SolrServer.** > java:49) > at org.apache.nutch.indexer.solr.**SolrWriter.close(SolrWriter.** > java:82) > at org.apache.nutch.indexer.**IndexerOutputFormat$1.close(** > IndexerOutputFormat.java:48) > at org.apache.hadoop.mapred.**ReduceTask.runOldReducer(** > ReduceTask.java:474) > at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:411) > at org.apache.hadoop.mapred.**LocalJobRunner$Job.run(** > LocalJobRunner.java:216) > 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException: Job > failed! > > The same error appears in the solr log. > > I have tried the 'sync solrj libraries' fix; that is, I copied > apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no > effect. Since I am running binaries, I, of course, did not run ant job. Is > that the magic? > > Any suggestions? > > > > > > >

