You must upgrade Nutch' two SolrJ jar's to 3.1. On Tuesday 19 April 2011 14:59:28 Max Stricker wrote: > Hi all, > > I am trying to run Nutch and Solr using the tutorial from [1]. > When indexing the content to Solr using command > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb > crawl/segments/* I see the following error: > SolrIndexer: starting at 2011-04-19 03:00:52 > java.io.IOException: Job failed! > > The log indicates a Runtime Error when unmarshalling an object: > 2011-04-19 03:01:26,630 INFO plugin.PluginRepository - Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) 2011-04-19 03:01:26,630 INFO > plugin.PluginRepository - Ontology Model Loader > (org.apache.nutch.ontology.Ontology) 2011-04-19 03:01:26,634 INFO > indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-19 03:01:26,635 > INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-19 > 03:01:26,673 INFO solr.SolrMappingReader - source: content dest: content > 2011-04-19 03:01:26,673 INFO solr.SolrMappingReader - source: site dest: > site 2011-04-19 03:01:26,673 INFO solr.SolrMappingReader - source: title > dest: title 2011-04-19 03:01:26,674 INFO solr.SolrMappingReader - source: > host dest: host 2011-04-19 03:01:26,674 INFO solr.SolrMappingReader - > source: segment dest: segment 2011-04-19 03:01:26,674 INFO > solr.SolrMappingReader - source: boost dest: boost 2011-04-19 03:01:26,674 > INFO solr.SolrMappingReader - source: digest dest: digest 2011-04-19 > 03:01:26,674 INFO solr.SolrMappingReader - source: tstamp dest: tstamp > 2011-04-19 03:01:26,674 INFO solr.SolrMappingReader - source: url dest: > id 2011-04-19 03:01:26,674 INFO solr.SolrMappingReader - source: url > dest: url 2011-04-19 03:01:28,544 WARN mapred.LocalJobRunner - > job_local_0001 java.lang.RuntimeException: Invalid version or the data in > not in 'javabin' format at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) > at > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Bin > aryResponseParser.java:39) at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt > pSolrServer.java:466) at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt > pSolrServer.java:243) at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac > tUpdateRequest.java:105) at > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.j > ava:48) at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > 2011-04-19 03:01:29,472 ERROR solr.SolrIndexer - java.io.IOException: Job > failed! > > Any ideas what could go wrong here? I strictly followed the mentioned > tutorial. The index should be fine: > m@ubuntu:~/Desktop/nutch$ bin/nutch readdb crawl/crawldb -stats > CrawlDb statistics start: crawl/crawldb > Statistics for CrawlDb: crawl/crawldb > TOTAL urls: 284 > retry 0: 284 > min score: 0.0 > avg score: 0.030411972 > max score: 1.208 > status 1 (db_unfetched): 17 > status 2 (db_fetched): 227 > status 3 (db_gone): 40 > CrawlDb statistics: done > > regards, > Max > > [1] http://wiki.apache.org/nutch/RunningNutchAndSolr
-- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

