Hi all,

I am trying to figure out how to debug this situation, or figure out what I
am missing between two different cores. let me explain:

I started using Nutch to crawl and index our page to a solr core1. And it
worked fine. Job completed like it should.

Though I wanted to start indexing or page to our solr core0, along with
other items that we want to index.

The indexing is not the problem, it will crawl and index fine. But on core0
it continues to fail on the delete duplicates task at the end of the index.
I get the following error (below). From what I can tell, the schema.xml and
solrconfig.xml files have all the same things across core0 and core1 except
for in core0 the url field is no longer required as the other indexed items
(not the crawled web pages, but records from our db) don't have a url, so
the id field is the standard, required field across all of them. Could it be
this that's causing the problem? what is the deduper trying to do and what
is getting in its way? and how might I get this straightened out for core0?
thanks!:


The hadoop.log error: 
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
2013-07-26 16:55:17,797 INFO solr.SolrIndexWriter - Indexing 157 documents
2013-07-26 16:55:30,407 INFO solr.SolrMappingReader - source: content dest:
content 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: title
dest: title 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source:
host dest: host 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader -
source: segment dest: segment 2013-07-26 16:55:30,444 INFO
solr.SolrMappingReader - source: boost dest: boost 2013-07-26 16:55:30,444
INFO solr.SolrMappingReader - source: digest dest: digest 2013-07-26
16:55:30,444 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: url dest: id
2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: url dest: url
2013-07-26 16:55:31,590 INFO indexer.IndexingJob - Indexer: finished at
2013-07-26 16:55:31, elapsed: 00:00:19 2013-07-26 16:55:31,593 INFO
solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-07-26
16:55:31 2013-07-26 16:55:31,593 INFO solr.SolrDeleteDuplicates -
SolrDeleteDuplicates: Solr url: http://<domain>:<port>/solr/core0/
2013-07-26 16:55:32,043 WARN mapred.FileOutputCommitter - Output path is
null in cleanup 2013-07-26 16:55:32,043 WARN mapred.LocalJobRunner -
job_local1142877999_0055 java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException at
org.apache.hadoop.io.Text.encode(Text.java:388) at
org.apache.hadoop.io.Text.set(Text.java:178) at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deleting-Duplicates-works-fine-on-one-solr-core-but-not-on-antother-Nutch-1-5-tp4080931.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to