Thank you for catching my mistake. I will try this out.
Madhvi



On 11/6/13 2:19 PM, "Talat UYARER" <[email protected]> wrote:

>You wrote wrong. You should write like this
>
><property>
><name>plugin.includes</name>
><value>protocol-http|urlfilter-(regex|validator)|parse-(html|tika|metatags
>|js
>|swf)|index-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass|
>r
>egex|basic)</value>
></property>
>
>And you write in nutch-site.xml after than you should rebuild with ant
>clean runtime
>
>Talat
>
>[email protected] şunu yazdı:
>
>>Hi Talat,
>>No, I am not using url filter-validator plugin. Here is my list of
>>plugins:
>>
>><property>
>>  <name>plugin.includes</name>
>>  
>><value>protocol-http|urlfilter-regex|parse-(html|tika|metatags|js|swf)|in
>>de
>>x-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass|regex|bas
>>ic
>>)</value>
>></property>
>>
>>
>>Do I just need to change this to:
>>
>><property>
>><name>plugin.includes</name>
>><value>protocol-http|urlfilter-regex|parse|validator-(html|tika|metatags|
>>js
>>|swf)|index-(basic|anchor|metadata|more)|scoring-opic|urlnormalizer-(pass
>>|r
>>egex|basic)</value>
>></property>
>>
>>Thank you so much,
>>
>>
>>
>>Madhvi
>>
>>
>>
>>
>>
>>
>>
>>On 11/6/13 1:08 PM, "Talat UYARER" <[email protected]> wrote:
>>
>>>Hi Madhvi,
>>>
>>>Can you tell me what is your active plugins in your nutch-site.xml. I am
>>>not sure but we have a issue simalar this. if your solr return null,
>>>this
>>>will because this issue. Please check your solr return data
>>>
>>>You can look at https://issues.apache.org/jira/browse/NUTCH-1100
>>>
>>>if yours is same, you should use urlfilter-validator plugin.
>>>
>>>Urlfilter-validator has lots of benifit.  i told in
>>>http://mail-archives.apache.org/mod_mbox/nutch-user/201310.mbox/%3c5265B
>>>C2
>>>[email protected]%3e
>>>
>>>Talat
>>>
>>>[email protected] şunu yazdı:
>>>
>>>>I am going to start my own thread rather than being under javozzo's
>>>>thread :)!
>>>>
>>>>Hi,
>>>>
>>>>
>>>>I am using Nutch 1.5.1 and Solr 3.6 and having problem with command
>>>>SolrDeleteDuplicates. Looking at Hadoop logs: I am getting error:
>>>>
>>>>java.lang.NullPointerException
>>>>at org.apache.hadoop.io.Text.encode(Text.java:388)
>>>>at org.apache.hadoop.io.Text.set(Text.java:178)
>>>>at
>>>>org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.ne
>>>>xt
>>>>(S
>>>>olrDeleteDuplicates.java:270)
>>>>at
>>>>org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.ne
>>>>xt
>>>>(S
>>>>olrDeleteDuplicates.java:241)
>>>>at
>>>>org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask
>>>>.j
>>>>av
>>>>a:236)
>>>>at
>>>>org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:
>>>>21
>>>>6)
>>>>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>>>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>>>>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>>at
>>>>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212
>>>>)
>>>>
>>>>
>>>>Also had another question about updating Nutch to 1.6 and 1.7. I had
>>>>tried
>>>>updating to newer version of Nutch but got exception during deleting
>>>>duplicates in SOLR. After lot of research online found that a field had
>>>>changed. A few said digest field and others said that url field is no
>>>>longer there. So here are my questions:
>>>>1:  Is there a newer solr mapping file that needs to be used?
>>>>2: Can the SOLR index from 1.5.1 and index from newer version co-exist
>>>>or
>>>>we need to re-index from one version of Nutch?
>>>>
>>>>I will really appreciate any help with this.
>>>>
>>>>
>>>>Thanks in advance,
>>>>Madhvi
>>>>
>>>>Madhvi Arora
>>>>AutomationDirect
>>>>The #1 Best Mid-Sized Company to work for in
>>>>Atlanta<http://www.ajc.com/business/topworkplaces/automationdirect-com-
>>>>to
>>>>p-midsize-1421260.html> 2012
>>>>
>>

Reply via email to