Thanks for your quick reply.

I will try to use scoreupdater next time=)

Unfortunately -addDays would not work for me because I want to refetch only specified domains, not all db (my first question was not correct). Another problem with -addDays and FetchSchedule is that I have to use generate.topN lower than size of part for refetch (there are some time restrictions for index update)
, so i can't determine when to stop using addDays
On Fri 14 Oct 2011 04:52:33 PM MSK, Markus Jelsma wrote:
There are no tools for resetting the score but it would not be hard to modify
an existing tool for that e.g. WebGraph's scoreupdater tool. You can force
refetch by using the -addDays switch with the generator tool. It'll add
numDays to the current time to generate records that are not yet due for
fetch.

On Friday 14 October 2011 14:48:47 Sergey A Volkov wrote:
Hi!

Is there any good way to modify all crawldb records? (e.g. drop score or
force refetch).

I'm using now nutch 1.2 and as I see the only way to do this is writing
own MapReduce task for every modification or changing CrawlDb updater
and writing own extension point.

Sergey Volkov.



Reply via email to