Hi Talat, I can re-create this exception. This exception starts happening as soon as I index from outside Nutch. SolrDeleleteDuplicates works fine as long as the whole solr index came from Nutch. I haven't found out yet specifically which field might be causing it. But looking at issue below, it might be because of the digest field not being there. https://issues.apache.org/jira/browse/NUTCH-1100
Can it be some other field? Also, there is a patch for digest field. How should I apply it? Any help will be great! Madhvi On 11/6/13 2:19 PM, "Talat UYARER" <[email protected]> wrote: >You wrote wrong. You should write like this > ><property> ><name>plugin.includes</name> ><value>protocol-http|urlfilter-(regex|validator)|parse-(html|tika|metatags >|js >|swf)|index-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass| >r >egex|basic)</value> ></property> > >And you write in nutch-site.xml after than you should rebuild with ant >clean runtime > >Talat > >[email protected] şunu yazdı: > >>Hi Talat, >>No, I am not using url filter-validator plugin. Here is my list of >>plugins: >> >><property> >> <name>plugin.includes</name> >> >><value>protocol-http|urlfilter-regex|parse-(html|tika|metatags|js|swf)|in >>de >>x-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass|regex|bas >>ic >>)</value> >></property> >> >> >>Do I just need to change this to: >> >><property> >><name>plugin.includes</name> >><value>protocol-http|urlfilter-regex|parse|validator-(html|tika|metatags| >>js >>|swf)|index-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass >>|r >>egex|basic)</value> >></property> >> >>Thank you so much, >> >> >> >>Madhvi >> >> >> >> >> >> >> >>On 11/6/13 1:08 PM, "Talat UYARER" <[email protected]> wrote: >> >>>Hi Madhvi, >>> >>>Can you tell me what is your active plugins in your nutch-site.xml. I am >>>not sure but we have a issue simalar this. if your solr return null, >>>this >>>will because this issue. Please check your solr return data >>> >>>You can look at https://issues.apache.org/jira/browse/NUTCH-1100 >>> >>>if yours is same, you should use urlfilter-validator plugin. >>> >>>Urlfilter-validator has lots of benifit. i told in >>>http://mail-archives.apache.org/mod_mbox/nutch-user/201310.mbox/%3c5265B >>>C2 >>>[email protected]%3e >>> >>>Talat >>> >>>[email protected] şunu yazdı: >>> >>>>I am going to start my own thread rather than being under javozzo's >>>>thread :)! >>>> >>>>Hi, >>>> >>>> >>>>I am using Nutch 1.5.1 and Solr 3.6 and having problem with command >>>>SolrDeleteDuplicates. Looking at Hadoop logs: I am getting error: >>>> >>>>java.lang.NullPointerException >>>>at org.apache.hadoop.io.Text.encode(Text.java:388) >>>>at org.apache.hadoop.io.Text.set(Text.java:178) >>>>at >>>>org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.ne >>>>xt >>>>(S >>>>olrDeleteDuplicates.java:270) >>>>at >>>>org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.ne >>>>xt >>>>(S >>>>olrDeleteDuplicates.java:241) >>>>at >>>>org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask >>>>.j >>>>av >>>>a:236) >>>>at >>>>org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java: >>>>21 >>>>6) >>>>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >>>>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) >>>>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) >>>>at >>>>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212 >>>>) >>>> >>>> >>>>Also had another question about updating Nutch to 1.6 and 1.7. I had >>>>tried >>>>updating to newer version of Nutch but got exception during deleting >>>>duplicates in SOLR. After lot of research online found that a field had >>>>changed. A few said digest field and others said that url field is no >>>>longer there. So here are my questions: >>>>1: Is there a newer solr mapping file that needs to be used? >>>>2: Can the SOLR index from 1.5.1 and index from newer version co-exist >>>>or >>>>we need to re-index from one version of Nutch? >>>> >>>>I will really appreciate any help with this. >>>> >>>> >>>>Thanks in advance, >>>>Madhvi >>>> >>>>Madhvi Arora >>>>AutomationDirect >>>>The #1 Best Mid-Sized Company to work for in >>>>Atlanta<http://www.ajc.com/business/topworkplaces/automationdirect-com- >>>>to >>>>p-midsize-1421260.html> 2012 >>>> >>

