Ah, this is a known problem which i cannot reproduce anymore.
https://issues.apache.org/jira/browse/NUTCH-1100
It's triggered because Solr returns something the SolrInputFormat of Nutch
cannot deal with. Can you please run the query in a browser and see if you
find anything unusual in the returned results?
INFO: [core_en] webapp=/solr path=/select
params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&version=2}
hits=52 status=0 QTime=2
It's likely in the id field, the other three fields are highly unlikely to
contain garbage.
Thanks
On Tuesday 24 January 2012 14:13:27 you wrote:
> hadoop.log:
>
> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: content
> dest: content 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader -
> source: site dest: site 2012-01-24 14:09:37,156 INFO
> solr.SolrMappingReader - source: title dest: teaser 2012-01-24
> 14:09:37,156 INFO solr.SolrMappingReader - source: boost dest: boost
> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: tstamp
> dest: changed 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader -
> source: tstamp dest: created 2012-01-24 14:09:37,370 INFO solr.SolrWriter
> - Adding 2 documents 2012-01-24 14:09:38,095 INFO solr.SolrIndexer -
> SolrIndexer: finished at 2012-01-24 14:09:38, elapsed: 00:00:02 2012-01-24
> 14:09:38,097 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates:
> starting at 2012-01-24 14:09:38 2012-01-24 14:09:38,097 INFO
> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url:
> http://192.168.0.47:8080/solr/core_en/ 2012-01-24 14:09:38,457 WARN
> mapred.LocalJobRunner - job_local_0010 java.lang.NullPointerException
> at org.apache.hadoop.io.Text.encode(Text.java:388)
> at org.apache.hadoop.io.Text.set(Text.java:178)
> at
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(
> SolrDeleteDuplicates.java:284) at
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(
> SolrDeleteDuplicates.java:249) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.ja
> va:192) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176
> ) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
> Solr (running out of eclipse with jetty):
>
> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy onInit
> INFO: SolrDeletionPolicy.onInit: commits:num=1
>
> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core
> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=25,f
> ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii,
> _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx,
> _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt,
> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm]
> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: newest commit = 1326882792610
> 24.01.2012 14:09:37 org.apache.solr.update.processor.LogUpdateProcessor
> finish INFO:
> {add=[045756f6efde46c27a8e1016756bf99cc8153d51/nutch_external/http://www.d
> kd.de/,
> 5648ab376b909bc402c4ecbf45c26b4546e69f04/nutch_external/http://www.typo3-s
> olr.com/]} 0 71 24.01.2012 14:09:37 org.apache.solr.core.SolrCore execute
> INFO: [core_en] webapp=/solr path=/update params={wt=javabin&version=2}
> status=0 QTime=71 24.01.2012 14:09:37
> org.apache.solr.update.DirectUpdateHandler2 commit INFO: start
> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=fals
> e) 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy onCommit
> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>
> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core
> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=25,f
> ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii,
> _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx,
> _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt,
> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm]
> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core
> s/data/core_en/index,segFN=segments_q,version=1326882792614,generation=26,f
> ilenames=[_1.frq, _2.tii, _c.tii, _c.fdx, _c.tvx, _1.fnm, _2.tvx, _c.fdt,
> _2.tvd, _c.tis, _c.nrm, _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _c.prx,
> _2.fdt, _2.frq, _2.fdx, _2.fnm, _1.prx, _1.fdx, _2.tis, _1.tvf, _1.fdt,
> segments_q, _c.tvf, _c.tvd, _c.fnm, _1.tvd, _c.frq, _1.nrm, _2.nrm]
> 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: newest commit = 1326882792614
> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@2a44fec1 main
> 24.01.2012 14:09:38 org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
> .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@2a44fec1 main
>
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
> .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa
> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,
> cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@2a44fec1 main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa
> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,
> cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati
> o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@2a44fec1 main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati
> o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati
> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@2a44fec1 main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati
> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.core.QuerySenderListener newSearcher INFO:
> QuerySenderListener sending requests to Searcher@2a44fec1 main 24.01.2012
> 14:09:38 org.apache.solr.core.QuerySenderListener newSearcher INFO:
> QuerySenderListener done.
> 24.01.2012 14:09:38
> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener
> buildSpellIndex INFO: Building spell index for spell checker: default
> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore registerSearcher
> INFO: [core_en] Registered new searcher Searcher@2a44fec1 main
> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing Searcher@3d78cd7b main
>
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
> .00,cumulative_inserts=0,cumulative_evictions=0}
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa
> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,
> cumulative_inserts=0,cumulative_evictions=0}
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati
> o=0.72,cumulative_inserts=22,cumulative_evictions=0}
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati
> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38
> org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=}
> 0 212
> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
> INFO: [core_en] webapp=/solr path=/update
> params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=2}
> status=0 QTime=212 24.01.2012 14:09:38 org.apache.solr.core.SolrCore
> execute
> INFO: [core_en] webapp=/solr path=/select
> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 QTime=2
> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
> INFO: [core_en] webapp=/solr path=/select
> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 QTime=1
> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
> INFO: [core_en] webapp=/solr path=/select
> params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&version
> =2} hits=52 status=0 QTime=2
>
> > Please post the Nutch and Solr logs.
> >
> > On Tuesday 24 January 2012 13:46:25 Denis Sinner wrote:
> >> Hello,
> >>
> >> i have a setup Nutch crawler and try to index into a Solr Core where
> >> information is written by other applications aswell. The data gets
> >> indexed, but i get the following error:
> >>
> >> SolrDeleteDuplicates: starting at 2012-01-24 12:59:43
> >> SolrDeleteDuplicates: Solr url: http://192.168.0.47:8080/solr/core_en/
> >> Exception in thread "main" java.io.IOException: Job failed!
> >>
> >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> >> at
> >>
> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli
> >> ca tes.java:392) at
> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli
> >> ca tes.java:372) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
> >>
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> >>
> >> If i index into an empty Core on the same Solr server, i don't get this
> >> exception. Any hints how to solve it? I would be very Thankful.
> >>
> >> Thanks,
> >>
> >> Denis
--
Markus Jelsma - CTO - Openindex