Thanks Sergey, I don't think I was clear on the issue, the subdomain I'm speaking of won't be found by the crawler, I have to somehow add it, so in my original input url of: http://www.xyz.com/stuff there is absolutely no way the crawler would know about http://abc.xyz.com/stuff I have to somehow dynamically add the subdomain. I also don't have the option of actually adding 'http://abc.xyz.com/stuff' in my input file (a bit of an extra convolution I don't want to bore you with!!).
Thanks, Peyman On Sun, Nov 6, 2011 at 1:21 PM, Sergey A Volkov <[email protected]> wrote: > Hi! > > I think you should use urlfilter-regex like "http://\w\.xyz\.com/stuff.*" > instead of urlfilter-domain and set db.ignore.external.links to false, this > will work, but this is quite slow if you have many regex. > > You may also try to add xyz.com to domain-suffixes.xml, this may cause some > side effects, i had never tested this, just looked in DomainURLFilter > source, so it's probably not really good idea. > > Sergey Volkov > > On Mon 07 Nov 2011 12:35:30 AM MSK, Peyman Mohajerian wrote: >> >> Hi Guys, >> >> Let's say my input file is: >> http://www.xyz.com/stuff >> >> and I have thousands of these URLs in my input. How do I configure >> Nutch to also crawl this subdomain for each input: >> http://abc.xyz.com/stuff >> >> I don't want to just replace 'www' with 'abc' i want to crawl both. >> >> Thanks >> Peyman > > >

