Solr's schema has it's own version that's 1.4 in current 3.x. See inline comments: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/example/solr/conf/schema.xml?view=markup
> Markus, > > What do you mean by "update the schema version"? Nutch's or Solr's? > And are we talking about simple copies or line-by-line merges? And what > about the schema copy specified in the RunningNutchAndSolr tutorial? > > This sounds like the answer, I just don't know enough to do it. tnx. > > On 8/8/2011 8:04 PM, Markus Jelsma wrote: > > 3.3 will work perfectly as there are no changes the the javabin format. > > However, one should update the schema version to reflect recent changes > > in branch 3.4-dev. It's likely this branch version is released earlier > > than Nutch 1.4 that should be compatible with the most recent stable > > Solr release. > > > >> Glad it worked for you on Solr 3.2. I did try Nutch 1.3 and Solr 3.3, > >> however I did not update my blog yet with Solr 3.3. ;-) > >> > >> have fun! > >> > >> On Mon, Aug 8, 2011 at 1:57 PM, John R. Brinkema > >> > >> <[email protected]>wrote: > >>> On 8/2/2011 11:21 PM, Way Cool wrote: > >>>> Try changing uniqueKey from id to url as below under in schema.xml and > >>>> restart Solr: > >>>> <uniqueKey>url</uniqueKey> > >>>> > >>>> If that still did not work, that means you are having an empty url. We > >>>> can fix that. > >>>> > >>>> > >>>> On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema<brinkema@teo.** > >>>> uscourts.gov<[email protected]> > >>>> > >>>>> wrote: > >>>>> Friends, > >>>>> > >>>>> I am having the worst time getting nutch and solr to play together > >>>>> nicely. > >>>>> > >>>>> I downloaded and installed the current binaries for both nutch and > >>>>> solr. > >>>>> > >>>>> I > >>>>> > >>>>> edited the nutch-site.xml file to include: > >>>>> > >>>>> <property> > >>>>> <name>http.agent.name</name> > >>>>> <value>Solr/Nutch Search</value> > >>>>> </property> > >>>>> <property> > >>>>> <name>plugin.includes</name> > >>>>> <value>protocol-http|****urlfilter-regex|parse-(text|****html|tika)| > >>>>> index-basic|query-(basic|****stemmer|site|url)|summary-**** > >>>>> basic|scoring-opic| > >>>>> urlnormalizer-(pass|regex|****basic)</value> > >>>>> </property> > >>>>> <property> > >>>>> <name>http.content.limit</****name> > >>>>> <value>65536</value> > >>>>> </property> > >>>>> <property> > >>>>> <name>searcher.dir</name> > >>>>> <value>/opt/SolrSearch</value> > >>>>> </property> > >>>>> > >>>>> > >>>>> I installed them and tested them according to each of their > >>>>> respective tutorials; in other words I believe each is working, > >>>>> separately. I crawled > >>>>> a url and the 'readdb -stats' report shows that I have successfully > >>>>> collected some links. Most of the links are to '.pdf' files. > >>>>> > >>>>> I followed the instructions to link nutch and solr; e.g. copy the > >>>>> nutch schema to become the solr schema. > >>>>> > >>>>> When I run the bin/nutch solrindex ... command I get the following > >>>>> error: > >>>>> > >>>>> java.io.IOException: Job failed! > >>>>> > >>>>> When I look in the log/hadoop.log file I see: > >>>>> > >>>>> 2011-08-01 13:10:00,086 INFO solr.SolrMappingReader - source: > >>>>> content dest: content > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: site > >>>>> dest: site > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: title > >>>>> dest: > >>>>> title > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: host > >>>>> dest: host > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: > >>>>> segment dest: segment > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: boost > >>>>> dest: > >>>>> boost > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: digest > >>>>> dest: > >>>>> digest > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: tstamp > >>>>> dest: > >>>>> tstamp > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url > >>>>> dest: id > >>>>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url > >>>>> dest: url > >>>>> 2011-08-01 13:10:00,537 WARN mapred.LocalJobRunner - job_local_0001 > >>>>> org.apache.solr.common.****SolrException: Document [null] missing > >>>>> required > >>>>> field: id > >>>>> > >>>>> Document [null] missing required field: id > >>>>> > >>>>> request: > >>>>> http://localhost:8983/solr/****update?wt=javabin&version=2<http://loc > >>>>> a lhost:8983/solr/**update?wt=javabin&version=2> > >>>>> <ht**tp://localhost:8983/solr/**update?wt=javabin&version=2<http://lo > >>>>> c alhost:8983/solr/update?wt=javabin&version=2> > >>>>> > >>>>> at > >>>>> org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.* > >>>>> * > >>>>> > >>>>> request(CommonsHttpSolrServer.****java:435) > >>>>> > >>>>> at > >>>>> org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.* > >>>>> * > >>>>> > >>>>> request(CommonsHttpSolrServer.****java:244) > >>>>> > >>>>> at org.apache.solr.client.solrj.****request.** > >>>>> > >>>>> AbstractUpdateRequest.** > >>>>> process(AbstractUpdateRequest.****java:105) > >>>>> > >>>>> at > >>>>> org.apache.solr.client.solrj.****SolrServer.add(SolrServer.* > >>>>> * > >>>>> > >>>>> java:49) > >>>>> > >>>>> at > >>>>> org.apache.nutch.indexer.solr.****SolrWriter.close(SolrWriter > >>>>> . > >>>>> > >>>>> **** > >>>>> java:82) > >>>>> > >>>>> at > >>>>> org.apache.nutch.indexer.****IndexerOutputFormat$1.close(** > >>>>> > >>>>> IndexerOutputFormat.java:48) > >>>>> > >>>>> at org.apache.hadoop.mapred.****ReduceTask.runOldReducer(** > >>>>> > >>>>> ReduceTask.java:474) > >>>>> > >>>>> at > >>>>> org.apache.hadoop.mapred.****ReduceTask.run(ReduceTask.**** > >>>>> > >>>>> java:411) > >>>>> > >>>>> at org.apache.hadoop.mapred.****LocalJobRunner$Job.run(** > >>>>> > >>>>> LocalJobRunner.java:216) > >>>>> 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException: > >>>>> Job failed! > >>>>> > >>>>> The same error appears in the solr log. > >>>>> > >>>>> I have tried the 'sync solrj libraries' fix; that is, I copied > >>>>> apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with > >>>>> no effect. Since I am running binaries, I, of course, did not run > >>>>> ant job. > >>>>> > >>>>> Is > >>>>> > >>>>> that the magic? > >>>>> > >>>>> Any suggestions? > >>>>> > >>>>> Update from the trenches .... > >>> > >>> I followed Way Cool's suggestion (now called Dr. Cool since he has > >>> been so helpful) of using Nutch 1.3 and Solr 3.2 ... which worked just > >>> fine. > >>> > >>> I am off using this pair until a get a breather and then try Nutch 1.3 > >>> and Solr 3.3 again, this time with Dr. Cool's latest suggestion/ > >>> > >>> Thanks to all. /jb

