Don't delete the crawl db, that's pointless. You can either delete the whole 
segment or remove all but crawl_generate and try again. You should delete the 
segment if you've successfully crawled another segment after that segment 
because it'll contain the same URL's.

 
 
-----Original message-----
> From:Alaak <al...@gmx.de>
> Sent: Sat 08-Sep-2012 10:43
> To: user@nutch.apache.org
> Cc: Markus Jelsma <markus.jel...@openindex.io>
> Subject: Re: Keeping an externally created field in solr.
> 
> Hi,
> 
> Ok. Thanks. Then I guess I will follow your last proposal and read the 
> value from the Solr Index if the URL is already there.
> 
> Am Sa 08 Sep 2012 00:11:41 CEST schrieb Markus Jelsma:
> >
> > No, but you could modify the indexer to do so. Or make use of Solr's 
> > new capability of updating specific fields. You could also modifiy 
> > that indexer plugin to fetch the value for that field from some source 
> > you have prior to indexing. I think the latter is the easiest to make 
> > but it only works for fields specifically set by Nutch.
> >
> > -----Original message-----
> >>
> >> From:Alaak <al...@gmx.de>
> >> Sent: Sat 08-Sep-2012 00:08
> >> To: user@nutch.apache.org
> >> Subject: Keeping an externally created field in solr.
> >>
> >> Hi,
> >>
> >> I have an external program which changes a field for some websites
> >> within my Solr index. Nutch sets this field to a default value using a
> >> plugin on indexing a page. My problem now is that nutch resets the field
> >> for already indexed pages as well, when it updates those pages. Do I
> >> have any possibility to tell Nutch it should not touch that field if it
> >> already exists within the Solr Index?
> >>
> >> Thanks and Regards
> 

Reply via email to