Hi Yulio, Marcus wrote the MimeAdaptiveFetchSchedule [0] implementation for exactly this purpose. You can utilize it as per [1]
[0] https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/MimeAdaptiveFetchSchedule.java [1] https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L487-L492 On Sun, May 1, 2016 at 7:43 AM, <user-digest-h...@nutch.apache.org> wrote: > From: Yulio Aleman Jimenez <yuli...@uci.cu> > To: user@nutch.apache.org > Cc: > Date: Fri, 29 Apr 2016 16:47:32 -0400 (CDT) > Subject: Priorize links in Fetching Step > Hi. > > I'm using Nutch 1.9 with Solr 4.10 in a local environment. > I need a way to priorize some links in the Fetching Steps, through > filtering the new links identified in the last crawls by some criterias, > for example the extension of the resource. The goal is priorize images, > documents, etc, before HTML pages in crawling process. > > Is there any property in nutch-site.xml or any plugin capable to do this?? > How can I do this??? > > I accept any sugestion, or some source code snippets for creating a new > plugin for nutch. > > Best regards > > -- > Ing. Yulio Aleman Jimenez > Dpto. Soluciones Informáticas para Internet. CIDI > Universidad de las Ciencias Informáticas (UCI) > > ----------------------------------------------------------------------------------------------------------------------------------- > "Podrán morir los hombres, PERO JAMÁS SUS IDEAS" > > > La UCI presente este 1ro. de Mayo en la Plaza de la Revolución > junto a todo el pueblo.¡Por Cuba: Unidad y Compromiso! > > > -- *Lewis*