Hi. I'm using Nutch 1.9 with Solr 4.10 in a local environment. I need a way to priorize some links in the Fetching Steps, through filtering the new links identified in the last crawls by some criterias, for example the extension of the resource. The goal is priorize images, documents, etc, before HTML pages in crawling process.
Is there any property in nutch-site.xml or any plugin capable to do this?? How can I do this??? I accept any sugestion, or some source code snippets for creating a new plugin for nutch. Best regards -- Ing. Yulio Aleman Jimenez Dpto. Soluciones Informáticas para Internet. CIDI Universidad de las Ciencias Informáticas (UCI) ----------------------------------------------------------------------------------------------------------------------------------- "Podrán morir los hombres, PERO JAMÁS SUS IDEAS" La UCI presente este 1ro. de Mayo en la Plaza de la Revolución junto a todo el pueblo.¡Por Cuba: Unidad y Compromiso!