Re: Priorize links in Fetching Step

Lewis John Mcgibbney Sun, 01 May 2016 12:40:42 -0700

Hi Yulio,

Marcus wrote the MimeAdaptiveFetchSchedule [0] implementation for exactly
this purpose.
You can utilize it as per [1]



[0]
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/MimeAdaptiveFetchSchedule.java
[1]
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L487-L492

On Sun, May 1, 2016 at 7:43 AM, <user-digest-h...@nutch.apache.org> wrote:

> From: Yulio Aleman Jimenez <yuli...@uci.cu>
> To: user@nutch.apache.org
> Cc:
> Date: Fri, 29 Apr 2016 16:47:32 -0400 (CDT)
> Subject: Priorize links in Fetching Step
> Hi.
>
> I'm using Nutch 1.9 with Solr 4.10 in a local environment.
> I need a way to priorize some links in the Fetching Steps, through
> filtering the new links identified in the last crawls by some criterias,
> for example the extension of the resource. The goal is priorize images,
> documents, etc, before HTML pages in crawling process.
>
> Is there any property in nutch-site.xml or any plugin capable to do this??
> How can I do this???
>
> I accept any sugestion, or some source code snippets for creating a new
> plugin for nutch.
>
> Best regards
>
> --
> Ing. Yulio Aleman Jimenez
> Dpto. Soluciones Informáticas para Internet. CIDI
> Universidad de las Ciencias Informáticas (UCI)
>
> -----------------------------------------------------------------------------------------------------------------------------------
> "Podrán morir los hombres, PERO JAMÁS SUS IDEAS"
>
>
> La UCI presente este 1ro. de Mayo en la Plaza de la Revolución
> junto a todo el pueblo.¡Por Cuba: Unidad y Compromiso!
>
>
>


-- 
*Lewis*

Re: Priorize links in Fetching Step

Reply via email to