Oh right, thanks, its because other application also added documents with an 
"id" Field to the index (but the id there being constructed not just out of an 
url)

I could index the url to something like "nutch_id" and change 
org.apache.nutch.indexer.solr.SolrConstants ID_FIELD - not the best solution 
thought

-- 

[Entwickler]

dkd Internet Service GmbH
development // kommunikation // design
Kaiserstraße 73
60329 Frankfurt/Main

fon:  +49 69 2475218-0
fax:  +49 69 2475218-99
e-mail: [email protected]
twitter: http://twitter.com/dkd_de
facebook: http://www.facebook.com/www.dkd.de
web: http://www.dkd.de

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast, Christian 
Zabanski

Aktuelle Projekte:
http://www.spielwarenmesse-eg.de – Relaunch & Responsive Design (TYPO3)
http://www.horsch.com – Relaunch Website (TYPO3)
http://www.dosb.de – Refresh Website (TYPO3)






Am 24.01.2012 um 14:18 schrieb Markus Jelsma:

> Ah, this is a known problem which i cannot reproduce anymore.
> 
> https://issues.apache.org/jira/browse/NUTCH-1100
> 
> It's triggered because Solr returns something the SolrInputFormat of Nutch 
> cannot deal with. Can you please run the query in a browser and see if you 
> find anything unusual in the returned results?
> 
> INFO: [core_en] webapp=/solr path=/select 
> params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&version=2} 
> hits=52 status=0 QTime=2 
> 
> It's likely in the id field, the other three fields are highly unlikely to 
> contain garbage.
> 
> Thanks
> 
> 
> 
> On Tuesday 24 January 2012 14:13:27 you wrote:
>> hadoop.log:
>> 
>> 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader - source: content
>> dest: content 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader -
>> source: site dest: site 2012-01-24 14:09:37,156 INFO 
>> solr.SolrMappingReader - source: title dest: teaser 2012-01-24
>> 14:09:37,156 INFO  solr.SolrMappingReader - source: boost dest: boost
>> 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader - source: tstamp
>> dest: changed 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader -
>> source: tstamp dest: created 2012-01-24 14:09:37,370 INFO  solr.SolrWriter
>> - Adding 2 documents 2012-01-24 14:09:38,095 INFO  solr.SolrIndexer -
>> SolrIndexer: finished at 2012-01-24 14:09:38, elapsed: 00:00:02 2012-01-24
>> 14:09:38,097 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates:
>> starting at 2012-01-24 14:09:38 2012-01-24 14:09:38,097 INFO 
>> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url:
>> http://192.168.0.47:8080/solr/core_en/ 2012-01-24 14:09:38,457 WARN 
>> mapred.LocalJobRunner - job_local_0010 java.lang.NullPointerException
>>      at org.apache.hadoop.io.Text.encode(Text.java:388)
>>      at org.apache.hadoop.io.Text.set(Text.java:178)
>>      at
>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(
>> SolrDeleteDuplicates.java:284) at
>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(
>> SolrDeleteDuplicates.java:249) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.ja
>> va:192) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176
>> ) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>      at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 
>> Solr (running out of eclipse with jetty):
>> 
>> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy onInit
>> INFO: SolrDeletionPolicy.onInit: commits:num=1
>>      
>> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core
>> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=25,f
>> ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii,
>> _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx,
>> _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt,
>> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm]
>> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: newest commit = 1326882792610
>> 24.01.2012 14:09:37 org.apache.solr.update.processor.LogUpdateProcessor
>> finish INFO:
>> {add=[045756f6efde46c27a8e1016756bf99cc8153d51/nutch_external/http://www.d
>> kd.de/,
>> 5648ab376b909bc402c4ecbf45c26b4546e69f04/nutch_external/http://www.typo3-s
>> olr.com/]} 0 71 24.01.2012 14:09:37 org.apache.solr.core.SolrCore execute
>> INFO: [core_en] webapp=/solr path=/update params={wt=javabin&version=2}
>> status=0 QTime=71 24.01.2012 14:09:37
>> org.apache.solr.update.DirectUpdateHandler2 commit INFO: start
>> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=fals
>> e) 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy onCommit
>> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>>      
>> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core
>> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=25,f
>> ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii,
>> _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx,
>> _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt,
>> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm]
>> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3core
>> s/data/core_en/index,segFN=segments_q,version=1326882792614,generation=26,f
>> ilenames=[_1.frq, _2.tii, _c.tii, _c.fdx, _c.tvx, _1.fnm, _2.tvx, _c.fdt,
>> _2.tvd, _c.tis, _c.nrm, _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _c.prx,
>> _2.fdt, _2.frq, _2.fdx, _2.fnm, _1.prx, _1.fdx, _2.tis, _1.tvf, _1.fdt,
>> segments_q, _c.tvf, _c.tvd, _c.fnm, _1.tvd, _c.frq, _1.nrm, _2.nrm]
>> 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: newest commit = 1326882792614
>> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher <init>
>> INFO: Opening Searcher@2a44fec1 main
>> 24.01.2012 14:09:38 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: end_commit_flush
>> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>>      
>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
>> .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@2a44fec1 main
>>      
>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
>> .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>>      
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa
>> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,
>> cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@2a44fec1 main
>>      
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa
>> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,
>> cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>>      
>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
>> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati
>> o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@2a44fec1 main
>>      
>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
>> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati
>> o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
>>      
>> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati
>> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@2a44fec1 main
>>      
>> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati
>> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.core.QuerySenderListener newSearcher INFO:
>> QuerySenderListener sending requests to Searcher@2a44fec1 main 24.01.2012
>> 14:09:38 org.apache.solr.core.QuerySenderListener newSearcher INFO:
>> QuerySenderListener done.
>> 24.01.2012 14:09:38
>> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener
>> buildSpellIndex INFO: Building spell index for spell checker: default
>> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore registerSearcher
>> INFO: [core_en] Registered new searcher Searcher@2a44fec1 main
>> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher close
>> INFO: Closing Searcher@3d78cd7b main
>>      
>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
>> .00,cumulative_inserts=0,cumulative_evictions=0}
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,wa
>> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,
>> cumulative_inserts=0,cumulative_evictions=0}
>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
>> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitrati
>> o=0.72,cumulative_inserts=22,cumulative_evictions=0}
>> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitrati
>> o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 14:09:38
>> org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=}
>> 0 212
>> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
>> INFO: [core_en] webapp=/solr path=/update
>> params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=2}
>> status=0 QTime=212 24.01.2012 14:09:38 org.apache.solr.core.SolrCore
>> execute
>> INFO: [core_en] webapp=/solr path=/select
>> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 QTime=2
>> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
>> INFO: [core_en] webapp=/solr path=/select
>> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 QTime=1
>> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
>> INFO: [core_en] webapp=/solr path=/select
>> params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&version
>> =2} hits=52 status=0 QTime=2
>> 
>>> Please post the Nutch and Solr logs.
>>> 
>>> On Tuesday 24 January 2012 13:46:25 Denis Sinner wrote:
>>>> Hello,
>>>> 
>>>> i have a setup Nutch crawler and try to index into a Solr Core where
>>>> information is written by other applications aswell. The data gets
>>>> indexed, but i get the following error:
>>>> 
>>>> SolrDeleteDuplicates: starting at 2012-01-24 12:59:43
>>>> SolrDeleteDuplicates: Solr url: http://192.168.0.47:8080/solr/core_en/
>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>> 
>>>>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>>>>    at
>>>> 
>>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli
>>>> ca tes.java:392) at
>>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli
>>>> ca tes.java:372) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
>>>> 
>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>>>> 
>>>> If i index into an empty Core on the same Solr server, i don't get this
>>>> exception. Any hints how to solve it? I would be very Thankful.
>>>> 
>>>> Thanks,
>>>> 
>>>> Denis
> 
> -- 
> Markus Jelsma - CTO - Openindex
> 

Reply via email to