On 07.10.24 13:55, Markus Jelsma wrote:
Hello,

Refetch interval is mainly controlled by the configured FetchSchedule
class. But it can also be controlled in a custom ScoringFilter in
updateDbScore(). We use both.

Nutch comes with an AdaptiveFetchSchedule that increases interval for
records that change more often, usually link/hub/overview pages, and does
the opposite for unchanging pages. This works well for simple sites.

Oh this sounds great! It would mean Nutch learns on it's own which URLs
to fetch more or less often and adapts the intervals accordingly.

Now I am interested in the inner mechanics. Where can I learn more about
this?
How long would it take the algorithm for one or the other decision? What
are the boundaries (minimum or maximum fetch intervals)?
Is there a prerequisite that plugins need to fulfill for nutch to
realize the content has changed or not?

Reply via email to