You wrote wrong. You should write like this <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-(regex|validator)|parse-(html|tika|metatags|js |swf)|index-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass|r egex|basic)</value> </property>
And you write in nutch-site.xml after than you should rebuild with ant clean runtime Talat [email protected] şunu yazdı: >Hi Talat, >No, I am not using url filter-validator plugin. Here is my list of plugins: > ><property> > <name>plugin.includes</name> > ><value>protocol-http|urlfilter-regex|parse-(html|tika|metatags|js|swf)|inde >x-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass|regex|basic >)</value> ></property> > > >Do I just need to change this to: > ><property> ><name>plugin.includes</name> ><value>protocol-http|urlfilter-regex|parse|validator-(html|tika|metatags|js >|swf)|index-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass|r >egex|basic)</value> ></property> > >Thank you so much, > > > >Madhvi > > > > > > > >On 11/6/13 1:08 PM, "Talat UYARER" <[email protected]> wrote: > >>Hi Madhvi, >> >>Can you tell me what is your active plugins in your nutch-site.xml. I am >>not sure but we have a issue simalar this. if your solr return null, this >>will because this issue. Please check your solr return data >> >>You can look at https://issues.apache.org/jira/browse/NUTCH-1100 >> >>if yours is same, you should use urlfilter-validator plugin. >> >>Urlfilter-validator has lots of benifit. i told in >>http://mail-archives.apache.org/mod_mbox/nutch-user/201310.mbox/%3c5265BC2 >>[email protected]%3e >> >>Talat >> >>[email protected] şunu yazdı: >> >>>I am going to start my own thread rather than being under javozzo's >>>thread :)! >>> >>>Hi, >>> >>> >>>I am using Nutch 1.5.1 and Solr 3.6 and having problem with command >>>SolrDeleteDuplicates. Looking at Hadoop logs: I am getting error: >>> >>>java.lang.NullPointerException >>>at org.apache.hadoop.io.Text.encode(Text.java:388) >>>at org.apache.hadoop.io.Text.set(Text.java:178) >>>at >>>org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next >>>(S >>>olrDeleteDuplicates.java:270) >>>at >>>org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next >>>(S >>>olrDeleteDuplicates.java:241) >>>at >>>org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.j >>>av >>>a:236) >>>at >>>org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:21 >>>6) >>>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >>>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) >>>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) >>>at >>>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >>> >>> >>>Also had another question about updating Nutch to 1.6 and 1.7. I had >>>tried >>>updating to newer version of Nutch but got exception during deleting >>>duplicates in SOLR. After lot of research online found that a field had >>>changed. A few said digest field and others said that url field is no >>>longer there. So here are my questions: >>>1: Is there a newer solr mapping file that needs to be used? >>>2: Can the SOLR index from 1.5.1 and index from newer version co-exist or >>>we need to re-index from one version of Nutch? >>> >>>I will really appreciate any help with this. >>> >>> >>>Thanks in advance, >>>Madhvi >>> >>>Madhvi Arora >>>AutomationDirect >>>The #1 Best Mid-Sized Company to work for in >>>Atlanta<http://www.ajc.com/business/topworkplaces/automationdirect-com-to >>>p-midsize-1421260.html> 2012 >>> >

