Hi all, I am trying to figure out how to debug this situation, or figure out what I am missing between two different cores. let me explain:
I started using Nutch to crawl and index our page to a solr core1. And it worked fine. Job completed like it should. Though I wanted to start indexing or page to our solr core0, along with other items that we want to index. The indexing is not the problem, it will crawl and index fine. But on core0 it continues to fail on the delete duplicates task at the end of the index. I get the following error (below). From what I can tell, the schema.xml and solrconfig.xml files have all the same things across core0 and core1 except for in core0 the url field is no longer required as the other indexed items (not the crawled web pages, but records from our db) don't have a url, so the id field is the standard, required field across all of them. Could it be this that's causing the problem? what is the deduper trying to do and what is getting in its way? and how might I get this straightened out for core0? thanks!: The hadoop.log error: ------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------- 2013-07-26 16:55:17,797 INFO solr.SolrIndexWriter - Indexing 157 documents 2013-07-26 16:55:30,407 INFO solr.SolrMappingReader - source: content dest: content 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: title dest: title 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: host dest: host 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: segment dest: segment 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: boost dest: boost 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: digest dest: digest 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: url dest: id 2013-07-26 16:55:30,444 INFO solr.SolrMappingReader - source: url dest: url 2013-07-26 16:55:31,590 INFO indexer.IndexingJob - Indexer: finished at 2013-07-26 16:55:31, elapsed: 00:00:19 2013-07-26 16:55:31,593 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-07-26 16:55:31 2013-07-26 16:55:31,593 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://<domain>:<port>/solr/core0/ 2013-07-26 16:55:32,043 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2013-07-26 16:55:32,043 WARN mapred.LocalJobRunner - job_local1142877999_0055 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:388) at org.apache.hadoop.io.Text.set(Text.java:178) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) ------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------- -- View this message in context: http://lucene.472066.n3.nabble.com/Deleting-Duplicates-works-fine-on-one-solr-core-but-not-on-antother-Nutch-1-5-tp4080931.html Sent from the Nutch - User mailing list archive at Nabble.com.

