Hi,

I am using Nutch 1.5.1 and Solr 1.6 and having problem with command
SolrDeleteDuplicates. Looking at Hadoop logs: I am getting error:

java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:388)
        at org.apache.hadoop.io.Text.set(Text.java:178)
        at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(S
olrDeleteDuplicates.java:270)
        at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(S
olrDeleteDuplicates.java:241)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.jav
a:236)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)


Also had another question about updating Nutch to 1.6 and 1.7. I had tried
updating to newer version of Nutch but got exception during deleting
duplicates in SOLR. After lot of research online found that a field had
changed. A few said digest field and others said that url field is no
longer there. So here are my questions:
1:  Is there a newer solr mapping file that needs to be used?
2: Can the SOLR index from 1.5.1 and index from newer version co-exist or
we need to re-index from one version of Nutch?

I will really appreciate any help with this.


Thanks in advance,
Madhvi





Reply via email to