Yes this would limit the number of URLs from any one domain, but it would
not explain why one domain seems to get fetched more after recursive fetches
of some given seed set.

Can you explain more about your crawling operation? Are you executing a
crawl command? If so what arguements are you passing?

If not can you give more detail of the job you are running

Thank you

On Fri, Jul 8, 2011 at 2:50 PM, Hannes Carl Meyer <hannesc...@googlemail.com
> wrote:

> Hi,
>
> you could set generate.max.per.host to a reasonable size to prevent this!
> On a default configuration this is set to -1 which means unlimited.
>
> BR
>
> Hannes
>
> ---
> Hannes Carl Meyer
> www.informera.de
>
> On Fri, Jul 8, 2011 at 2:53 PM, Eggebrecht, Thomas (GfK Marktforschung) <
> thomas.eggebre...@gfk.com> wrote:
>
> > Hi list,
> >
> > My seed list contains URLs from about 20 different domains. In the first
> > fetch cycles everything is all right and all domains will be selected
> quite
> > equally distributed. But after about 10-15 cycles one domain starts to
> > prevail. URLs from all other domains will not be selected anymore. It
> seems
> > that URLs from that certain domain have the highest scoring and URLs from
> > other domains don't have a chance anymore. Is this a right assumption?
> >
> > I'm not very happy because I would like to fetch URLs from all domains in
> > each cycle. What would you do in that case?
> >
> > Best regards and thanks for answers
> > Thomas
> >
> > (Using nutch-1.2)
> >
> >
> > GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014;
> > Management Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp
> > (CFO), Dr. Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent,
> Wilhelm
> > R. Wessels; Chairman of the Supervisory Board: Dr. Arno Mahlert
> > This email and any attachments may contain confidential or privileged
> > information. Please note that unauthorized copying, disclosure or
> > distribution of the material in this email is not permitted.
> >
>



-- 
*Lewis*

Reply via email to