Hi Jean,

  db.ignore.external.links=true
should work. Which version of Nutch are you using?
How is the property set? Does your seed list only
contain URLs from mysite.com, and none from mysite.es?

Regards,
Sebastian

On 05/25/2016 11:44 AM, Jean Vence wrote:
> I am trying to crawl a single site and have used
> db.ignore.external.links=true flag. But it seems to fail because it
> will crawl sites with a different country extension so for example: if
> the seed is mysite.com, it will crawl mysite.com, mysite.es &
> mysite.it -
> 
> I dont want to use a regex to exclude them because I have multiple
> URLs and don't want to maintain a long list.
> 
> Is this a known bug?
> 
> Thanks,
> 
> Jean Vence
> 

Reply via email to