On Thursday 26 January 2012 12:58:33 Lewis John Mcgibbney wrote: > Hi Kaveh, > > I'm not sure if your problem is the same at all. > You're problem stems from the solr mapping configuration used by > AnchorIndexingFilter in the index-anchor plugin.
He? That plugin has nothing to do with Solr mapping or Solr at all. > If this works properly then you should see a list of all of the source --> > destination field mappings, Mappings are not loaded for deduplication. > this unfortunately is not the case and needs to > be resolved before you can progress. > > Maybe once this is sorted you can address the MR NPE Did anything strange pop up in the Solr logs? We should get rid of this dedup implementation, it's flawed and seems to break up everywhere. > > hth > > On Thu, Jan 26, 2012 at 1:02 AM, kaveh minooie <[email protected]> wrote: > > Hi I think I am havign a simillar problem. this is what i got in the > > hadoop.log file (nutch log file) after running this command : > > > > > > bin/nutch crawl urls/ -solr http://solr3:8983/solr/core8 -dir mycrawldir > > -threads 2 -depth 2 -topN 20 > > > > and here is the result( from hadoop.log): > > > > 2012-01-25 16:42:37,174 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.**anchor.AnchorIndexingFilter > > 2012-01-25 16:42:40,151 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.**basic.BasicIndexingFilter > > 2012-01-25 16:42:40,151 INFO anchor.AnchorIndexingFilter - Anchor > > deduplication is: off > > 2012-01-25 16:42:40,151 INFO indexer.IndexingFilters - Adding > > org.apache.nutch.indexer.**anchor.AnchorIndexingFilter > > 2012-01-25 16:42:40,167 WARN solr.SolrMappingReader - > > java.net.MalformedURLException > > 2012-01-25 16:42:40,341 INFO solr.SolrWriter - Indexing 21 documents > > 2012-01-25 16:42:44,137 INFO solr.SolrIndexer - SolrIndexer: finished at > > 2012-01-25 16:42:44, elapsed: 00:00:34 > > 2012-01-25 16:42:44,143 INFO solr.SolrDeleteDuplicates - > > SolrDeleteDuplicates: starting at 2012-01-25 16:42:44 > > 2012-01-25 16:42:44,144 INFO solr.SolrDeleteDuplicates - > > SolrDeleteDuplicates: Solr url: http://solr3:8983/solr/core8 > > 2012-01-25 16:42:44,295 WARN mapred.FileOutputCommitter - Output path is > > null in cleanup > > 2012-01-25 16:42:44,296 WARN mapred.LocalJobRunner - job_local_0015 > > java.lang.NullPointerException > > > > at org.apache.nutch.indexer.solr.**SolrDeleteDuplicates$** > > > > SolrRecord.readSolrDocument(**SolrDeleteDuplicates.java:131) > > > > at org.apache.nutch.indexer.solr.**SolrDeleteDuplicates$** > > > > SolrInputFormat$1.next(**SolrDeleteDuplicates.java:271) > > > > at org.apache.nutch.indexer.solr.**SolrDeleteDuplicates$** > > > > SolrInputFormat$1.next(**SolrDeleteDuplicates.java:241) > > > > at org.apache.hadoop.mapred.**MapTask$TrackedRecordReader.** > > > > moveToNext(MapTask.java:236) > > > > at org.apache.hadoop.mapred.**MapTask$TrackedRecordReader.** > > > > next(MapTask.java:216) > > > > at org.apache.hadoop.mapred.**MapRunner.run(MapRunner.java:**48) > > at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.** > > > > java:436) > > > > at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:372) > > at org.apache.hadoop.mapred.**LocalJobRunner$Job.run(** > > > > LocalJobRunner.java:212) > > > > what is it talking about in this line: > > > > 2012-01-25 16:42:44,295 WARN mapred.FileOutputCommitter - Output path is > > null in cleanup > > > > what ouput path is it talking about? > > > > (I am running this locally not on hadoop) > > > > On 01/24/2012 05:13 AM, Denis Sinner wrote: > >> hadoop.log: > >> > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: content > >> dest: content > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: site > >> dest: site > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: title > >> dest: teaser > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: boost > >> dest: boost > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: tstamp > >> dest: changed > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: tstamp > >> dest: created > >> 2012-01-24 14:09:37,370 INFO solr.SolrWriter - Adding 2 documents > >> 2012-01-24 14:09:38,095 INFO solr.SolrIndexer - SolrIndexer: finished > >> at 2012-01-24 14:09:38, elapsed: 00:00:02 > >> 2012-01-24 14:09:38,097 INFO solr.SolrDeleteDuplicates - > >> SolrDeleteDuplicates: starting at 2012-01-24 14:09:38 > >> 2012-01-24 14:09:38,097 INFO solr.SolrDeleteDuplicates - > >> SolrDeleteDuplicates: Solr url: > >> http://192.168.0.47:8080/solr/**core_en/<http://192.168.0.47:8080/solr/ > >> core_en/> 2012-01-24 14:09:38,457 WARN mapred.LocalJobRunner - > >> job_local_0010 java.lang.NullPointerException > >> > >> at org.apache.hadoop.io.Text.**encode(Text.java:388) > >> at org.apache.hadoop.io.Text.set(**Text.java:178) > >> at org.apache.nutch.indexer.solr.**SolrDeleteDuplicates$** > >> > >> SolrInputFormat$1.next(**SolrDeleteDuplicates.java:284) > >> > >> at org.apache.nutch.indexer.solr.**SolrDeleteDuplicates$** > >> > >> SolrInputFormat$1.next(**SolrDeleteDuplicates.java:249) > >> > >> at org.apache.hadoop.mapred.**MapTask$TrackedRecordReader.** > >> > >> moveToNext(MapTask.java:192) > >> > >> at org.apache.hadoop.mapred.**MapTask$TrackedRecordReader.** > >> > >> next(MapTask.java:176) > >> > >> at org.apache.hadoop.mapred.**MapRunner.run(MapRunner.java:**48) > >> at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.** > >> > >> java:358) > >> > >> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307) > >> at org.apache.hadoop.mapred.**LocalJobRunner$Job.run(** > >> > >> LocalJobRunner.java:177) > >> > >> Solr (running out of eclipse with jetty): > >> > >> 24.01.2012 14:09:37 org.apache.solr.core.**SolrDeletionPolicy onInit > >> INFO: SolrDeletionPolicy.onInit: commits:num=1 > >> > >> commit{dir=/Users/dkd-sinner/**Documents/solr/** > >> > >> SolrTypo3Plugin/solr/**typo3cores/data/core_en/index,** > >> segFN=segments_p,version=**1326882792610,generation=25,**filenames=[_1.f > >> rq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii, _2.tvf, > >> _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx, _2.fnm, > >> _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt, > >> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm] > >> 24.01.2012 14:09:37 org.apache.solr.core.**SolrDeletionPolicy > >> updateCommits > >> INFO: newest commit = 1326882792610 > >> 24.01.2012 14:09:37 > >> org.apache.solr.update.**processor.LogUpdateProcessor finish > >> INFO: {add=[**045756f6efde46c27a8e1016756bf9**9cc8153d51/nutch_external/ > >> http**://www.dkd.de/ <http://www.dkd.de/>, > >> 5648ab376b909bc402c4ecbf45c26b > >> **4546e69f04/nutch_external/http**://www.typo3-solr.com/<http://www.typ > >> o3-solr.com/>]} 0 71 > >> 24.01.2012 14:09:37 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/update params={wt=javabin&version=2} > >> status=0 QTime=71 > >> 24.01.2012 14:09:37 org.apache.solr.update.**DirectUpdateHandler2 commit > >> INFO: start commit(optimize=false,**waitFlush=true,waitSearcher=** > >> true,expungeDeletes=false) > >> 24.01.2012 14:09:38 org.apache.solr.core.**SolrDeletionPolicy onCommit > >> INFO: SolrDeletionPolicy.onCommit: commits:num=2 > >> > >> commit{dir=/Users/dkd-sinner/**Documents/solr/** > >> > >> SolrTypo3Plugin/solr/**typo3cores/data/core_en/index,** > >> segFN=segments_p,version=**1326882792610,generation=25,**filenames=[_1.f > >> rq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, _1.tii, _2.tvf, > >> _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, _2.fdx, _2.fnm, > >> _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, _1.fdt, > >> segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, _2.nrm] > >> > >> commit{dir=/Users/dkd-sinner/**Documents/solr/** > >> > >> SolrTypo3Plugin/solr/**typo3cores/data/core_en/index,** > >> segFN=segments_q,version=**1326882792614,generation=26,**filenames=[_1.f > >> rq, _2.tii, _c.tii, _c.fdx, _c.tvx, _1.fnm, _2.tvx, _c.fdt, _2.tvd, > >> _c.tis, _c.nrm, _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _c.prx, _2.fdt, > >> _2.frq, _2.fdx, _2.fnm, _1.prx, _1.fdx, _2.tis, _1.tvf, _1.fdt, > >> segments_q, _c.tvf, _c.tvd, _c.fnm, _1.tvd, _c.frq, _1.nrm, _2.nrm] > >> 24.01.2012 14:09:38 org.apache.solr.core.**SolrDeletionPolicy > >> updateCommits > >> INFO: newest commit = 1326882792614 > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher<init> > >> INFO: Opening Searcher@2a44fec1 main > >> 24.01.2012 14:09:38 org.apache.solr.update.**DirectUpdateHandler2 commit > >> INFO: end_commit_flush > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> fieldValueCache{lookups=0,**hits=0,hitratio=0.00,inserts=** > >> > >> 0,evictions=0,size=0,**warmupTime=0,cumulative_** > >> lookups=0,cumulative_hits=0,**cumulative_hitratio=0.00,** > >> cumulative_inserts=0,**cumulative_evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> fieldValueCache{lookups=0,**hits=0,hitratio=0.00,inserts=** > >> > >> 0,evictions=0,size=0,**warmupTime=0,cumulative_** > >> lookups=0,cumulative_hits=0,**cumulative_hitratio=0.00,** > >> cumulative_inserts=0,**cumulative_evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> filterCache{lookups=0,hits=0,**hitratio=0.00,inserts=0,** > >> > >> evictions=0,size=0,warmupTime=**0,cumulative_lookups=0,** > >> cumulative_hits=0,cumulative_**hitratio=0.00,cumulative_** > >> inserts=0,cumulative_**evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> filterCache{lookups=0,hits=0,**hitratio=0.00,inserts=0,** > >> > >> evictions=0,size=0,warmupTime=**0,cumulative_lookups=0,** > >> cumulative_hits=0,cumulative_**hitratio=0.00,cumulative_** > >> inserts=0,cumulative_**evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> queryResultCache{lookups=0,**hits=0,hitratio=0.00,inserts=** > >> > >> 0,evictions=0,size=0,**warmupTime=0,cumulative_** > >> lookups=44,cumulative_hits=32,**cumulative_hitratio=0.72,** > >> cumulative_inserts=22,**cumulative_evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> queryResultCache{lookups=0,**hits=0,hitratio=0.00,inserts=** > >> > >> 0,evictions=0,size=0,**warmupTime=0,cumulative_** > >> lookups=44,cumulative_hits=32,**cumulative_hitratio=0.72,** > >> cumulative_inserts=22,**cumulative_evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> documentCache{lookups=0,hits=**0,hitratio=0.00,inserts=0,** > >> > >> evictions=0,size=0,warmupTime=**0,cumulative_lookups=1136,** > >> cumulative_hits=618,**cumulative_hitratio=0.54,**cumulative_inserts=518, > >> * *cumulative_evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> documentCache{lookups=0,hits=**0,hitratio=0.00,inserts=0,** > >> > >> evictions=0,size=0,warmupTime=**0,cumulative_lookups=1136,** > >> cumulative_hits=618,**cumulative_hitratio=0.54,**cumulative_inserts=518, > >> * *cumulative_evictions=0} > >> 24.01.2012 14:09:38 org.apache.solr.core.**QuerySenderListener > >> newSearcher > >> INFO: QuerySenderListener sending requests to Searcher@2a44fec1 main > >> 24.01.2012 14:09:38 org.apache.solr.core.**QuerySenderListener > >> newSearcher > >> INFO: QuerySenderListener done. > >> 24.01.2012 14:09:38 org.apache.solr.handler.** > >> component.SpellCheckComponent$**SpellCheckerListener buildSpellIndex > >> INFO: Building spell index for spell checker: default > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore registerSearcher > >> INFO: [core_en] Registered new searcher Searcher@2a44fec1 main > >> 24.01.2012 14:09:38 org.apache.solr.search.**SolrIndexSearcher close > >> INFO: Closing Searcher@3d78cd7b main > >> > >> fieldValueCache{lookups=0,**hits=0,hitratio=0.00,inserts=** > >> > >> 0,evictions=0,size=0,**warmupTime=0,cumulative_** > >> lookups=0,cumulative_hits=0,**cumulative_hitratio=0.00,** > >> cumulative_inserts=0,**cumulative_evictions=0} > >> > >> filterCache{lookups=0,hits=0,**hitratio=0.00,inserts=0,** > >> > >> evictions=0,size=0,warmupTime=**0,cumulative_lookups=0,** > >> cumulative_hits=0,cumulative_**hitratio=0.00,cumulative_** > >> inserts=0,cumulative_**evictions=0} > >> > >> queryResultCache{lookups=0,**hits=0,hitratio=0.00,inserts=** > >> > >> 0,evictions=0,size=0,**warmupTime=0,cumulative_** > >> lookups=44,cumulative_hits=32,**cumulative_hitratio=0.72,** > >> cumulative_inserts=22,**cumulative_evictions=0} > >> > >> documentCache{lookups=0,hits=**0,hitratio=0.00,inserts=0,** > >> > >> evictions=0,size=0,warmupTime=**0,cumulative_lookups=1136,** > >> cumulative_hits=618,**cumulative_hitratio=0.54,**cumulative_inserts=518, > >> * *cumulative_evictions=0} > >> 24.01.2012 14:09:38 > >> org.apache.solr.update.**processor.LogUpdateProcessor finish > >> INFO: {commit=} 0 212 > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/update params={waitSearcher=true&** > >> waitFlush=true&wt=javabin&**commit=true&version=2} status=0 QTime=212 > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/select > >> params={fl=id&wt=javabin&q=*:** *&rows=1&version=2} hits=52 status=0 > >> QTime=2 > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/select > >> params={fl=id&wt=javabin&q=*:** *&rows=1&version=2} hits=52 status=0 > >> QTime=1 > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/select params={fl=id,boost,tstamp,** > >> digest&start=0&q=*:*&wt=**javabin&rows=52&version=2} hits=52 status=0 > >> QTime=2 > > > > -- > > Kaveh Minooie > > > > www.plutoz.com -- Markus Jelsma - CTO - Openindex

