Oh right, thanks, its because other application also added documents with an "id" Field to the index (but the id there being constructed not just out of an url)
I could index the url to something like "nutch_id" and change org.apache.nutch.indexer.solr.SolrConstants ID_FIELD - not the best solution thought -- [Entwickler] dkd Internet Service GmbH development // kommunikation // design Kaiserstraße 73 60329 Frankfurt/Main fon: +49 69 2475218-0 fax: +49 69 2475218-99 e-mail: [email protected] twitter: http://twitter.com/dkd_de facebook: http://www.facebook.com/www.dkd.de web: http://www.dkd.de Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast, Christian Zabanski Aktuelle Projekte: http://www.spielwarenmesse-eg.de – Relaunch & Responsive Design (TYPO3) http://www.horsch.com – Relaunch Website (TYPO3) http://www.dosb.de – Refresh Website (TYPO3) Am 24.01.2012 um 14:18 schrieb Markus Jelsma: > Ah, this is a known problem which i cannot reproduce anymore. > > https://issues.apache.org/jira/browse/NUTCH-1100 > > It's triggered because Solr returns something the SolrInputFormat of Nutch > cannot deal with. Can you please run the query in a browser and see if you > find anything unusual in the returned results? > > INFO: [core_en] webapp=/solr path=/select > params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&version=2} > hits=52 status=0 QTime=2 > > It's likely in the id field, the other three fields are highly unlikely to > contain garbage. > > Thanks > > > > On Tuesday 24 January 2012 14:13:27 you wrote: >> hadoop.log: >> >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: content >> dest: content 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - >> source: site dest: site 2012-01-24 14:09:37,156 INFO >> solr.SolrMappingReader - source: title dest: teaser 2012-01-24 >> 14:09:37,156 INFO solr.SolrMappingReader - source: boost dest: boost >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: tstamp >> dest: changed 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - >> source: tstamp dest: created 2012-01-24 14:09:37,370 INFO solr.SolrWriter >> - Adding 2 documents 2012-01-24 14:09:38,095 INFO solr.SolrIndexer - >> SolrIndexer: finished at 2012-01-24 14:09:38, elapsed: 00:00:02 2012-01-24 >> 14:09:38,097 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: >> starting at 2012-01-24 14:09:38 2012-01-24 14:09:38,097 INFO >> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: >> http://192.168.0.47:8080/solr/core_en/ 2012-01-24 14:09:38,457 WARN >> mapred.LocalJobRunner - job_local_0010 java.lang.NullPointerException >> at org.apache.hadoop.io.Text.encode(Text.java:388) >> at org.apache.hadoop.io.Text.set(Text.java:178) >> at >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next( >> SolrDeleteDuplicates.java:284) at >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next( >> SolrDeleteDuplicates.java:249) at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.ja >> va:192) at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176 >> ) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> >> Solr (running out of eclipse with jetty): >> >> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy onInit >> INFO: SolrDeletionPolicy.onInit: commits:num=1 >> >> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core >> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=25,f >> ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii, >> _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx, >> _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt, >> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm] >> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy updateCommits >> INFO: newest commit = 1326882792610 >> 24.01.2012 14:09:37 org.apache.solr.update.processor.LogUpdateProcessor >> finish INFO: >> {add=[045756f6efde46c27a8e1016756bf99cc8153d51/nutch_external/http://www.d >> kd.de/, >> 5648ab376b909bc402c4ecbf45c26b4546e69f04/nutch_external/http://www.typo3-s >> olr.com/]} 0 71 24.01.2012 14:09:37 org.apache.solr.core.SolrCore execute >> INFO: [core_en] webapp=/solr path=/update params={wt=javabin&version=2} >> status=0 QTime=71 24.01.2012 14:09:37 >> org.apache.solr.update.DirectUpdateHandler2 commit INFO: start >> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=fals >> e) 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy onCommit >> INFO: SolrDeletionPolicy.onCommit: commits:num=2 >> >> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core >> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=25,f >> ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii, >> _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx, >> _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt, >> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm] >> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core >> s/data/core_en/index,segFN=segments_q,version=1326882792614,generation=26,f >> ilenames=[_1.frq, _2.tii, _c.tii, _c.fdx, _c.tvx, _1.fnm, _2.tvx, _c.fdt, >> _2.tvd, _c.tis, _c.nrm, _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _c.prx, >> _2.fdt, _2.frq, _2.fdx, _2.fnm, _1.prx, _1.fdx, _2.tis, _1.tvf, _1.fdt, >> segments_q, _c.tvf, _c.tvd, _c.fnm, _1.tvd, _c.frq, _1.nrm, _2.nrm] >> 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy updateCommits >> INFO: newest commit = 1326882792614 >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher <init> >> INFO: Opening Searcher@2a44fec1 main >> 24.01.2012 14:09:38 org.apache.solr.update.DirectUpdateHandler2 commit >> INFO: end_commit_flush >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main >> >> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0 >> .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for Searcher@2a44fec1 main >> >> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0 >> .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main >> >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00, >> cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for Searcher@2a44fec1 main >> >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00, >> cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main >> >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati >> o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for Searcher@2a44fec1 main >> >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati >> o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main >> >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati >> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for Searcher@2a44fec1 main >> >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati >> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.core.QuerySenderListener newSearcher INFO: >> QuerySenderListener sending requests to Searcher@2a44fec1 main 24.01.2012 >> 14:09:38 org.apache.solr.core.QuerySenderListener newSearcher INFO: >> QuerySenderListener done. >> 24.01.2012 14:09:38 >> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener >> buildSpellIndex INFO: Building spell index for spell checker: default >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore registerSearcher >> INFO: [core_en] Registered new searcher Searcher@2a44fec1 main >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher close >> INFO: Closing Searcher@3d78cd7b main >> >> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0 >> .00,cumulative_inserts=0,cumulative_evictions=0} >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00, >> cumulative_inserts=0,cumulative_evictions=0} >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati >> o=0.72,cumulative_inserts=22,cumulative_evictions=0} >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati >> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38 >> org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} >> 0 212 >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute >> INFO: [core_en] webapp=/solr path=/update >> params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=2} >> status=0 QTime=212 24.01.2012 14:09:38 org.apache.solr.core.SolrCore >> execute >> INFO: [core_en] webapp=/solr path=/select >> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 QTime=2 >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute >> INFO: [core_en] webapp=/solr path=/select >> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 QTime=1 >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute >> INFO: [core_en] webapp=/solr path=/select >> params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&version >> =2} hits=52 status=0 QTime=2 >> >>> Please post the Nutch and Solr logs. >>> >>> On Tuesday 24 January 2012 13:46:25 Denis Sinner wrote: >>>> Hello, >>>> >>>> i have a setup Nutch crawler and try to index into a Solr Core where >>>> information is written by other applications aswell. The data gets >>>> indexed, but i get the following error: >>>> >>>> SolrDeleteDuplicates: starting at 2012-01-24 12:59:43 >>>> SolrDeleteDuplicates: Solr url: http://192.168.0.47:8080/solr/core_en/ >>>> Exception in thread "main" java.io.IOException: Job failed! >>>> >>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) >>>> at >>>> >>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli >>>> ca tes.java:392) at >>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli >>>> ca tes.java:372) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) >>>> >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) >>>> >>>> If i index into an empty Core on the same Solr server, i don't get this >>>> exception. Any hints how to solve it? I would be very Thankful. >>>> >>>> Thanks, >>>> >>>> Denis > > -- > Markus Jelsma - CTO - Openindex >

